Thanks I will investigate the below thanks Joern. Can someone send me some 
pointers
to the Doc Cat API that I can find? Thanks.




On 6/29/17, 10:18 AM, "Joern Kottmann" <[email protected]> wrote:

    For 2. I would like to suggest that we implement doccat format support
    to train on that data.
    
    3. it would be best so think about how we want to test the doccat
    component, today we don't have any tests which use lots of data to
    evaluate it.
    Probably the sentitment data could solve this for us and a train and
    evaluate test could be included in the eval tests.
    
    +1 to revert and then do these steps after the 1.8.1 release.
    
    I can apply my PR myself if nobody objects.
    
    Jörn
    
    On Thu, Jun 29, 2017 at 7:10 PM, Chris Mattmann <[email protected]> wrote:
    > Hi Rodrigo,
    >
    > This is very useful feedback that I wish we would have had a long time 
ago.
    >
    > I will look into it and see if I can reproduce the CLI error. I did a 
full build and mvn
    > install (which I though would run tests?) before commiting and as I 
posted in JIRA
    > the tests passed for me? So I will have to look into that.
    >
    > That said, given your feedback that SentimentME and the Sentiment 
Component
    > doesn’t offer much over Document Classifier I agree with you, but wasn’t 
super
    > familiar with the Document Classifier API. That said, if we can get the 
same functionality
    > by just using Document Classifier why don’t we:
    >
    > 1. Remove the SentimentME and associated code (except for the unit tests)
    > 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
    > build models using Document Classifier API.
    > 3. Rename and keep the unit tests that test against Netflix and Stanford 
tree bank.
    >
    > That way we get basic sentiment analysis (that is working for us 
internally at JPL decently),
    > for Apache OpenNLP, and then if we want to build something better than a 
Document
    > Classification approach to sentiment we can do so.
    >
    > Thoughts?
    >
    > Thanks for your useful feedback. If everyone agrees this is a plan I can 
back out the code
    > using Joern’s revert, and then try and execute 1-3 above in a branch 
first. Thanks.
    >
    > Cheers,
    > Chris
    >
    >
    >
    > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <[email protected]> wrote:
    >
    >     Hi Chris,
    >
    >     I have been interested in the new sentiment component for a while,
    >     although truth to be told, I did not follow that closely. I have today
    >     looked at it and test it with some of the corpora you have mentioned.
    >     In order to do that, I checkout master to work with from this commit
    >     onwards
    >
    >     
https://github.com/apache/opennlp/commit/56321aab51a470cd2004b76fb1f5330881b943c1
    >
    >     1. I tried to run it from the CLI. The Sentiment component did not
    >     appear to be available.
    >     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
    >     SentimentTool is implemented to tag with a trained model).
    >     3. After that, the CLI tests did not pass. So, the CLI is currently
    >     non functional, unless I did something wrong, always possible, of
    >     course. See if you can reproduce that error.
    >
    >     I therefore did the tests via API. I implemented a little test for
    >     training, evaluating and tagging here:
    >
    >     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
    >
    >     I run the training on the large movies review from Stanford for binary
    >     polarity classification
    >
    >     http://ai.stanford.edu/~amaas/data/sentiment/
    >
    >     and on the two little samples multiclass files added in resources and
    >     mentioned in the previous email, using the first one for training and
    >     the second one for testing (maxent 100 iterations, cutoff 5).
    >
    >     2. Stanford results: 0.84264
    >     3. sample multiclass: 0.73
    >
    >     Given that this is a standard document classification task, I decided
    >     to train the doccat component from the CLI:
    >
    >     1. Stanford results: 0.84264 (BOW features by default).
    >     2. sample multiclass: 0.73
    >
    >     I then looked at the code of the sentiment component and saw that it
    >     is basically a document classifier working with bag of words features.
    >     No added functionality. So, my conclusions are:
    >
    >     1. The CLI needs to be fixed.
    >     2. The Sentiment component, as it is, provides the same functionality
    >     as the document classifier.
    >
    >     I would therefore reconsider this commit until those two issues are
    >     addressed. Just my opinion.
    >
    >     Best regards,
    >
    >     Rodrigo
    >
    >     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <[email protected]> 
wrote:
    >     >
    >     > Hey Joern,
    >     >
    >     > Sure, you can find the model data links here, along with our 
evaluation of them.
    >     >
    >     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
    >     >
    >     > There are other evaluations here:
    >     >
    >     > http://irds.usc.edu/SentimentAnalysisParser/models.html
    >     >
    >     > The HT provider review I cannot contribute at this time and I 
question its broad
    >     > applicability since it’s related to human trafficking. In addition 
we are still working
    >     > on publishing our analysis & evaluation of it which is why I 
removed it from the
    >     > commit.
    >     >
    >     > Cheers,
    >     > Chris
    >     >
    >     >
    >     >
    >     >
    >     >
    >     > On 6/29/17, 7:36 AM, "Joern Kottmann" <[email protected]> wrote:
    >     >
    >     >     Which data sets did you use to evaluate this?
    >     >     I was looking for a bit more than a sample file to train it.
    >     >
    >     >     I noticed that you checked in stanford and netflix models.
    >     >
    >     >     The stanford data set is probably this one:
    >     >     http://ai.stanford.edu/~amaas/data/sentiment/
    >     >
    >     >     Do you have a link for the netflix data?
    >     >
    >     >     Jörn
    >     >
    >     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann 
<[email protected]> wrote:
    >     >     > Absolutely you can find it here:
    >     >     >
    >     >     > 
opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ 
(for categorical /multi-class)
    >     >     > 
opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 
(for categorical/multi-class)
    >     >     >
    >     >     > We can also do similar files where instead of multi-class, we 
just use pos/neg as the label.
    >     >     >
    >     >     > Cheers,
    >     >     > Chris
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <[email protected]> 
wrote:
    >     >     >
    >     >     >     Hello Chris,
    >     >     >
    >     >     >     could you please point me to files I can use to train the 
sentiment
    >     >     >     component? I am currently looking again through the code 
and would
    >     >     >     like to train it myself.
    >     >     >
    >     >     >     Jörn
    >     >     >
    >     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ 
<[email protected]> wrote:
    >     >     >     > Hi All,
    >     >     >     >    First, let me take a share of blame for the comment 
Chris mentioned.  I believe I said something like the pull request was X 
revision behind and Y revisions ahead.  It was not meant to be rude, it was 
meant to say it is hard to review code when it is so different from the current 
code base. I am very excited that sentiment analysis is going to be added to 
OpenNLP, but I have not had time to play with it. If I were to say “great job” 
before I have add a chance to look at it, it would be flattery not honest 
praise.
    >     >     >     >
    >     >     >     >   Let’s clean up the merge.  I agree with Chris that 
scalability and perfection should not be our initial goal.  Let’s get 
something, and we can decide how to optimize later (even if it require a 
complete rewrite).  Perfection is the enemy of the good.
    >     >     >     >
    >     >     >     >   Finally, because of Chris’ comments it is hard to 
thank Ana and Chris without sounding insincere.  But I’ll try, thank you Chris 
and Ana.  I hope we can get beyond this and that Chris and Ana will continue to 
improve the performance of the sentiment analysis tool and happily remain part 
of the OpenNLP family.  It is also a good time to toss a big thank you to all 
of the committers, users, and PMC member.  I use OpenNLP almost everyday.  Your 
work is extremely valuable to me.
    >     >     >     >
    >     >     >     > Thank you,
    >     >     >     > Daniel
    >     >     >     >
    >     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann 
<[email protected]> wrote:
    >     >     >     >>
    >     >     >     >> Hi everyone,
    >     >     >     >>
    >     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
    >     >     >     >>
    >     >     >     >> 1. This was done with a Merge commit and apparently 
they squash and rebase.
    >     >     >     >> [would be helpful to see some pointer on this for 
documentation, thus far I
    >     >     >     >> haven’t found any]
    >     >     >     >> 2. Apparently we literally need to ask others for +1 
votes and record them
    >     >     >     >> before committing? I thought since Ana and I are 
committers aren were +1,
    >     >     >     >> and since Joern had been providing feedback (the last 
of which was to add
    >     >     >     >> tests, which we did) that he would be +1 as well (I 
guess he is not, and I guess
    >     >     >     >> formally we need to do a +1 vote even still)
    >     >     >     >> 3. There was concern about scalability of the code.
    >     >     >     >> 4. There are thoughts that the code was not perfect 
yet (even though it works
    >     >     >     >> fine in the MEMEX project for Ana and I)
    >     >     >     >>
    >     >     >     >> So, Joern has opened up a revert PR.
    >     >     >     >>
    >     >     >     >> I suppose I should state I find this process extremely 
heavyweight and unwelcoming.
    >     >     >     >> To me, there should be a modicum of trust for 
committers, but I feel like even as a
    >     >     >     >> committer, I am operating as a “contributor” to the 
project. Committer means that
    >     >     >     >> there is trust to modify the source code base. Of the 
issues above, the only one I see
    >     >     >     >> as a moderate snafu was #1, and frankly if there are 
some instructions that show me
    >     >     >     >> how to do squashing and rebasing *first* I will try to 
do that in the future since I am
    >     >     >     >> not a GIt expert.
    >     >     >     >>
    >     >     >     >> That said, I must state I feel pretty put off by 
Apache OpenNLP. This originated as a GSoC
    >     >     >     >> effort, and we have worked pretty consistently on this 
over the last year. We used a
    >     >     >     >> separate GitHub project to get started, kept Joern 
involved as another mentor, even
    >     >     >     >> provided access and commit writes to that GitHub 
repository for a long time, so this
    >     >     >     >> code was developed in the open. Joern even created a 
branch in ApacheOpenNLP in the code and I suppose
    >     >     >     >> I should have gone and worked on that branch first 
since master is apparently so
    >     >     >     >> pristine that even an Apache veteran like me can’t get 
something in to it without
    >     >     >     >> making a whole bunch of (what are IMO minor issues, 
and what are IMO heavyweight
    >     >     >     >> “community” issues).
    >     >     >     >>
    >     >     >     >> I am concerned from a community point of view that the 
first comment wasn’t “Great
    >     >     >     >> job Chris, you got Sentiment Analysis into Apache, 
*but* I have these concerns 1-4 above”.
    >     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m 
going to revert it.”
    >     >     >     >>
    >     >     >     >> That’s pretty off-putting to someone who is semi-new 
like me and like Ana.
    >     >     >     >>
    >     >     >     >> Anyways, go ahead and revert it. Sorry to have caused 
any issues.
    >     >     >     >>
    >     >     >     >> Chris
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" 
<[email protected]> wrote:
    >     >     >     >>
    >     >     >     >>    Hi Joern,
    >     >     >     >>
    >     >     >     >>    I’m confused. Why did you revert my commit?
    >     >     >     >>
    >     >     >     >>    Every one of those check points you put on the PR 
was checked?
    >     >     >     >>    We have been discussing this for months, you have 
seen the
    >     >     >     >>    code for months, Ana and I have worked diligently 
on the code
    >     >     >     >>    in plain view of everyone.
    >     >     >     >>
    >     >     >     >>    Please explain.
    >     >     >     >>
    >     >     >     >>    Chris
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" 
<[email protected]> wrote:
    >     >     >     >>
    >     >     >     >>        GitHub user kottmann opened a pull request:
    >     >     >     >>
    >     >     >     >>            https://github.com/apache/opennlp/pull/238
    >     >     >     >>
    >     >     >     >>            Revert merging of sentiment work, no 
consent to merge it
    >     >     >     >>
    >     >     >     >>            Thank you for contributing to Apache 
OpenNLP.
    >     >     >     >>
    >     >     >     >>            In order to streamline the review of the 
contribution we ask you
    >     >     >     >>            to ensure the following steps have been 
taken:
    >     >     >     >>
    >     >     >     >>            ### For all changes:
    >     >     >     >>            - [ ] Is there a JIRA ticket associated 
with this PR? Is it referenced
    >     >     >     >>                 in the commit message?
    >     >     >     >>
    >     >     >     >>            - [ ] Does your PR title start with 
OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay 
particular attention to the hyphen "-" character.
    >     >     >     >>
    >     >     >     >>            - [ ] Has your PR been rebased against the 
latest commit within the target branch (typically master)?
    >     >     >     >>
    >     >     >     >>            - [ ] Is your initial contribution a 
single, squashed commit?
    >     >     >     >>
    >     >     >     >>            ### For code changes:
    >     >     >     >>            - [ ] Have you ensured that the full suite 
of tests is executed via mvn clean install at the root opennlp folder?
    >     >     >     >>            - [ ] Have you written or updated unit 
tests to verify your changes?
    >     >     >     >>            - [ ] If adding new dependencies to the 
code, are these dependencies licensed in a way that is compatible for inclusion 
under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    >     >     >     >>            - [ ] If applicable, have you updated the 
LICENSE file, including the main LICENSE file in opennlp folder?
    >     >     >     >>            - [ ] If applicable, have you updated the 
NOTICE file, including the main NOTICE file found in opennlp folder?
    >     >     >     >>
    >     >     >     >>            ### For documentation related changes:
    >     >     >     >>            - [ ] Have you ensured that format looks 
appropriate for the output in which it is rendered?
    >     >     >     >>
    >     >     >     >>            ### Note:
    >     >     >     >>            Please ensure that once the PR is 
submitted, you check travis-ci for build issues and submit an update to your PR 
as soon as possible.
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>        You can merge this pull request into a Git 
repository by running:
    >     >     >     >>
    >     >     >     >>            $ git pull 
https://github.com/kottmann/opennlp revert_sentiment
    >     >     >     >>
    >     >     >     >>        Alternatively you can review and apply these 
changes as the patch at:
    >     >     >     >>
    >     >     >     >>            
https://github.com/apache/opennlp/pull/238.patch
    >     >     >     >>
    >     >     >     >>        To close this pull request, make a commit to 
your master/trunk branch
    >     >     >     >>        with (at least) the following in the commit 
message:
    >     >     >     >>
    >     >     >     >>            This closes #238
    >     >     >     >>
    >     >     >     >>        ----
    >     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
    >     >     >     >>        Author: Jörn Kottmann <[email protected]>
    >     >     >     >>        Date:   2017-06-27T08:19:19Z
    >     >     >     >>
    >     >     >     >>            Revert merging of sentiment work, no 
consent to merge it
    >     >     >     >>
    >     >     >     >>        ----
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>        ---
    >     >     >     >>        If your project is set up for it, you can reply 
to this email and have your
    >     >     >     >>        reply appear on GitHub as well. If your project 
does not have this feature
    >     >     >     >>        enabled and wishes so, or if the feature is 
enabled but not working, please
    >     >     >     >>        contact infrastructure at 
[email protected] or file a JIRA ticket
    >     >     >     >>        with INFRA.
    >     >     >     >>        ---
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >>
    >     >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    


Reply via email to