On Thu, Jun 29, 2017 at 1:10 PM, Chris Mattmann <[email protected]> wrote:

> Hi Rodrigo,
>
> This is very useful feedback that I wish we would have had a long time ago.
>
> I will look into it and see if I can reproduce the CLI error. I did a full
> build and mvn
> install (which I though would run tests?) before commiting and as I posted
> in JIRA
> the tests passed for me? So I will have to look into that.
>
> That said, given your feedback that SentimentME and the Sentiment Component
> doesn’t offer much over Document Classifier I agree with you, but wasn’t
> super
> familiar with the Document Classifier API. That said, if we can get the
> same functionality
> by just using Document Classifier why don’t we:
>
> 1. Remove the SentimentME and associated code (except for the unit tests)
> 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and
> build models using Document Classifier API.
> 3. Rename and keep the unit tests that test against Netflix and Stanford
> tree bank.
>

+1 for the above 3 steps. Let's go ahead with Step 1 today - that way we
can start planning on cutting a 1.8.1 release candidate this weekend.



>
> That way we get basic sentiment analysis (that is working for us
> internally at JPL decently),
> for Apache OpenNLP, and then if we want to build something better than a
> Document
> Classification approach to sentiment we can do so.
>
> Thoughts?
>
> Thanks for your useful feedback. If everyone agrees this is a plan I can
> back out the code
> using Joern’s revert, and then try and execute 1-3 above in a branch
> first. Thanks.
>
> Cheers,
> Chris
>
>
>
> On 6/29/17, 10:03 AM, "Rodrigo Agerri" <[email protected]> wrote:
>
>     Hi Chris,
>
>     I have been interested in the new sentiment component for a while,
>     although truth to be told, I did not follow that closely. I have today
>     looked at it and test it with some of the corpora you have mentioned.
>     In order to do that, I checkout master to work with from this commit
>     onwards
>
>     https://github.com/apache/opennlp/commit/
> 56321aab51a470cd2004b76fb1f5330881b943c1
>
>     1. I tried to run it from the CLI. The Sentiment component did not
>     appear to be available.
>     2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no
>     SentimentTool is implemented to tag with a trained model).
>     3. After that, the CLI tests did not pass. So, the CLI is currently
>     non functional, unless I did something wrong, always possible, of
>     course. See if you can reproduce that error.
>
>     I therefore did the tests via API. I implemented a little test for
>     training, evaluating and tagging here:
>
>     https://github.com/ixa-ehu/ixa-pipe-doc/tree/test
>
>     I run the training on the large movies review from Stanford for binary
>     polarity classification
>
>     http://ai.stanford.edu/~amaas/data/sentiment/
>
>     and on the two little samples multiclass files added in resources and
>     mentioned in the previous email, using the first one for training and
>     the second one for testing (maxent 100 iterations, cutoff 5).
>
>     2. Stanford results: 0.84264
>     3. sample multiclass: 0.73
>
>     Given that this is a standard document classification task, I decided
>     to train the doccat component from the CLI:
>
>     1. Stanford results: 0.84264 (BOW features by default).
>     2. sample multiclass: 0.73
>
>     I then looked at the code of the sentiment component and saw that it
>     is basically a document classifier working with bag of words features.
>     No added functionality. So, my conclusions are:
>
>     1. The CLI needs to be fixed.
>     2. The Sentiment component, as it is, provides the same functionality
>     as the document classifier.
>
>     I would therefore reconsider this commit until those two issues are
>     addressed. Just my opinion.
>
>     Best regards,
>
>     Rodrigo
>
>     On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <[email protected]>
> wrote:
>     >
>     > Hey Joern,
>     >
>     > Sure, you can find the model data links here, along with our
> evaluation of them.
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/datasets.html
>     >
>     > There are other evaluations here:
>     >
>     > http://irds.usc.edu/SentimentAnalysisParser/models.html
>     >
>     > The HT provider review I cannot contribute at this time and I
> question its broad
>     > applicability since it’s related to human trafficking. In addition
> we are still working
>     > on publishing our analysis & evaluation of it which is why I removed
> it from the
>     > commit.
>     >
>     > Cheers,
>     > Chris
>     >
>     >
>     >
>     >
>     >
>     > On 6/29/17, 7:36 AM, "Joern Kottmann" <[email protected]> wrote:
>     >
>     >     Which data sets did you use to evaluate this?
>     >     I was looking for a bit more than a sample file to train it.
>     >
>     >     I noticed that you checked in stanford and netflix models.
>     >
>     >     The stanford data set is probably this one:
>     >     http://ai.stanford.edu/~amaas/data/sentiment/
>     >
>     >     Do you have a link for the netflix data?
>     >
>     >     Jörn
>     >
>     >     On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann <
> [email protected]> wrote:
>     >     > Absolutely you can find it here:
>     >     >
>     >     > 
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ
> (for categorical /multi-class)
>     >     > 
> opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2
> (for categorical/multi-class)
>     >     >
>     >     > We can also do similar files where instead of multi-class, we
> just use pos/neg as the label.
>     >     >
>     >     > Cheers,
>     >     > Chris
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > On 6/29/17, 2:35 AM, "Joern Kottmann" <[email protected]>
> wrote:
>     >     >
>     >     >     Hello Chris,
>     >     >
>     >     >     could you please point me to files I can use to train the
> sentiment
>     >     >     component? I am currently looking again through the code
> and would
>     >     >     like to train it myself.
>     >     >
>     >     >     Jörn
>     >     >
>     >     >     On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ <
> [email protected]> wrote:
>     >     >     > Hi All,
>     >     >     >    First, let me take a share of blame for the comment
> Chris mentioned.  I believe I said something like the pull request was X
> revision behind and Y revisions ahead.  It was not meant to be rude, it was
> meant to say it is hard to review code when it is so different from the
> current code base. I am very excited that sentiment analysis is going to be
> added to OpenNLP, but I have not had time to play with it. If I were to say
> “great job” before I have add a chance to look at it, it would be flattery
> not honest praise.
>     >     >     >
>     >     >     >   Let’s clean up the merge.  I agree with Chris that
> scalability and perfection should not be our initial goal.  Let’s get
> something, and we can decide how to optimize later (even if it require a
> complete rewrite).  Perfection is the enemy of the good.
>     >     >     >
>     >     >     >   Finally, because of Chris’ comments it is hard to
> thank Ana and Chris without sounding insincere.  But I’ll try, thank you
> Chris and Ana.  I hope we can get beyond this and that Chris and Ana will
> continue to improve the performance of the sentiment analysis tool and
> happily remain part of the OpenNLP family.  It is also a good time to toss
> a big thank you to all of the committers, users, and PMC member.  I use
> OpenNLP almost everyday.  Your work is extremely valuable to me.
>     >     >     >
>     >     >     > Thank you,
>     >     >     > Daniel
>     >     >     >
>     >     >     >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann <
> [email protected]> wrote:
>     >     >     >>
>     >     >     >> Hi everyone,
>     >     >     >>
>     >     >     >> I spoke with Joern in Slack. Some of his concerns are:
>     >     >     >>
>     >     >     >> 1. This was done with a Merge commit and apparently
> they squash and rebase.
>     >     >     >> [would be helpful to see some pointer on this for
> documentation, thus far I
>     >     >     >> haven’t found any]
>     >     >     >> 2. Apparently we literally need to ask others for +1
> votes and record them
>     >     >     >> before committing? I thought since Ana and I are
> committers aren were +1,
>     >     >     >> and since Joern had been providing feedback (the last
> of which was to add
>     >     >     >> tests, which we did) that he would be +1 as well (I
> guess he is not, and I guess
>     >     >     >> formally we need to do a +1 vote even still)
>     >     >     >> 3. There was concern about scalability of the code.
>     >     >     >> 4. There are thoughts that the code was not perfect yet
> (even though it works
>     >     >     >> fine in the MEMEX project for Ana and I)
>     >     >     >>
>     >     >     >> So, Joern has opened up a revert PR.
>     >     >     >>
>     >     >     >> I suppose I should state I find this process extremely
> heavyweight and unwelcoming.
>     >     >     >> To me, there should be a modicum of trust for
> committers, but I feel like even as a
>     >     >     >> committer, I am operating as a “contributor” to the
> project. Committer means that
>     >     >     >> there is trust to modify the source code base. Of the
> issues above, the only one I see
>     >     >     >> as a moderate snafu was #1, and frankly if there are
> some instructions that show me
>     >     >     >> how to do squashing and rebasing *first* I will try to
> do that in the future since I am
>     >     >     >> not a GIt expert.
>     >     >     >>
>     >     >     >> That said, I must state I feel pretty put off by Apache
> OpenNLP. This originated as a GSoC
>     >     >     >> effort, and we have worked pretty consistently on this
> over the last year. We used a
>     >     >     >> separate GitHub project to get started, kept Joern
> involved as another mentor, even
>     >     >     >> provided access and commit writes to that GitHub
> repository for a long time, so this
>     >     >     >> code was developed in the open. Joern even created a
> branch in ApacheOpenNLP in the code and I suppose
>     >     >     >> I should have gone and worked on that branch first
> since master is apparently so
>     >     >     >> pristine that even an Apache veteran like me can’t get
> something in to it without
>     >     >     >> making a whole bunch of (what are IMO minor issues, and
> what are IMO heavyweight
>     >     >     >> “community” issues).
>     >     >     >>
>     >     >     >> I am concerned from a community point of view that the
> first comment wasn’t “Great
>     >     >     >> job Chris, you got Sentiment Analysis into Apache,
> *but* I have these concerns 1-4 above”.
>     >     >     >> It was “The PR was merged wrong in ways 1-4 and I’m
> going to revert it.”
>     >     >     >>
>     >     >     >> That’s pretty off-putting to someone who is semi-new
> like me and like Ana.
>     >     >     >>
>     >     >     >> Anyways, go ahead and revert it. Sorry to have caused
> any issues.
>     >     >     >>
>     >     >     >> Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >> On 6/27/17, 7:06 AM, "Chris Mattmann" <
> [email protected]> wrote:
>     >     >     >>
>     >     >     >>    Hi Joern,
>     >     >     >>
>     >     >     >>    I’m confused. Why did you revert my commit?
>     >     >     >>
>     >     >     >>    Every one of those check points you put on the PR
> was checked?
>     >     >     >>    We have been discussing this for months, you have
> seen the
>     >     >     >>    code for months, Ana and I have worked diligently on
> the code
>     >     >     >>    in plain view of everyone.
>     >     >     >>
>     >     >     >>    Please explain.
>     >     >     >>
>     >     >     >>    Chris
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>    On 6/27/17, 1:23 AM, "kottmann" <[email protected]>
> wrote:
>     >     >     >>
>     >     >     >>        GitHub user kottmann opened a pull request:
>     >     >     >>
>     >     >     >>            https://github.com/apache/opennlp/pull/238
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent
> to merge it
>     >     >     >>
>     >     >     >>            Thank you for contributing to Apache OpenNLP.
>     >     >     >>
>     >     >     >>            In order to streamline the review of the
> contribution we ask you
>     >     >     >>            to ensure the following steps have been
> taken:
>     >     >     >>
>     >     >     >>            ### For all changes:
>     >     >     >>            - [ ] Is there a JIRA ticket associated with
> this PR? Is it referenced
>     >     >     >>                 in the commit message?
>     >     >     >>
>     >     >     >>            - [ ] Does your PR title start with
> OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay
> particular attention to the hyphen "-" character.
>     >     >     >>
>     >     >     >>            - [ ] Has your PR been rebased against the
> latest commit within the target branch (typically master)?
>     >     >     >>
>     >     >     >>            - [ ] Is your initial contribution a single,
> squashed commit?
>     >     >     >>
>     >     >     >>            ### For code changes:
>     >     >     >>            - [ ] Have you ensured that the full suite
> of tests is executed via mvn clean install at the root opennlp folder?
>     >     >     >>            - [ ] Have you written or updated unit tests
> to verify your changes?
>     >     >     >>            - [ ] If adding new dependencies to the
> code, are these dependencies licensed in a way that is compatible for
> inclusion under [ASF 2.0](http://www.apache.org/
> legal/resolved.html#category-a)?
>     >     >     >>            - [ ] If applicable, have you updated the
> LICENSE file, including the main LICENSE file in opennlp folder?
>     >     >     >>            - [ ] If applicable, have you updated the
> NOTICE file, including the main NOTICE file found in opennlp folder?
>     >     >     >>
>     >     >     >>            ### For documentation related changes:
>     >     >     >>            - [ ] Have you ensured that format looks
> appropriate for the output in which it is rendered?
>     >     >     >>
>     >     >     >>            ### Note:
>     >     >     >>            Please ensure that once the PR is submitted,
> you check travis-ci for build issues and submit an update to your PR as
> soon as possible.
>     >     >     >>
>     >     >     >>
>     >     >     >>        You can merge this pull request into a Git
> repository by running:
>     >     >     >>
>     >     >     >>            $ git pull https://github.com/kottmann/
> opennlp revert_sentiment
>     >     >     >>
>     >     >     >>        Alternatively you can review and apply these
> changes as the patch at:
>     >     >     >>
>     >     >     >>            https://github.com/apache/
> opennlp/pull/238.patch
>     >     >     >>
>     >     >     >>        To close this pull request, make a commit to
> your master/trunk branch
>     >     >     >>        with (at least) the following in the commit
> message:
>     >     >     >>
>     >     >     >>            This closes #238
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>        commit 123222eb34724bae793e9d6d22e202c0aee0aa45
>     >     >     >>        Author: Jörn Kottmann <[email protected]>
>     >     >     >>        Date:   2017-06-27T08:19:19Z
>     >     >     >>
>     >     >     >>            Revert merging of sentiment work, no consent
> to merge it
>     >     >     >>
>     >     >     >>        ----
>     >     >     >>
>     >     >     >>
>     >     >     >>        ---
>     >     >     >>        If your project is set up for it, you can reply
> to this email and have your
>     >     >     >>        reply appear on GitHub as well. If your project
> does not have this feature
>     >     >     >>        enabled and wishes so, or if the feature is
> enabled but not working, please
>     >     >     >>        contact infrastructure at
> [email protected] or file a JIRA ticket
>     >     >     >>        with INFRA.
>     >     >     >>        ---
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >>
>     >     >     >
>     >     >
>     >     >
>     >     >
>     >
>     >
>     >
>
>
>
>

Reply via email to