On Thu, Jun 29, 2017 at 1:10 PM, Chris Mattmann <[email protected]> wrote:
> Hi Rodrigo, > > This is very useful feedback that I wish we would have had a long time ago. > > I will look into it and see if I can reproduce the CLI error. I did a full > build and mvn > install (which I though would run tests?) before commiting and as I posted > in JIRA > the tests passed for me? So I will have to look into that. > > That said, given your feedback that SentimentME and the Sentiment Component > doesn’t offer much over Document Classifier I agree with you, but wasn’t > super > familiar with the Document Classifier API. That said, if we can get the > same functionality > by just using Document Classifier why don’t we: > > 1. Remove the SentimentME and associated code (except for the unit tests) > 2. Use the sample datasets from NetFlix & Stanford Treebank sentiment and > build models using Document Classifier API. > 3. Rename and keep the unit tests that test against Netflix and Stanford > tree bank. > +1 for the above 3 steps. Let's go ahead with Step 1 today - that way we can start planning on cutting a 1.8.1 release candidate this weekend. > > That way we get basic sentiment analysis (that is working for us > internally at JPL decently), > for Apache OpenNLP, and then if we want to build something better than a > Document > Classification approach to sentiment we can do so. > > Thoughts? > > Thanks for your useful feedback. If everyone agrees this is a plan I can > back out the code > using Joern’s revert, and then try and execute 1-3 above in a branch > first. Thanks. > > Cheers, > Chris > > > > On 6/29/17, 10:03 AM, "Rodrigo Agerri" <[email protected]> wrote: > > Hi Chris, > > I have been interested in the new sentiment component for a while, > although truth to be told, I did not follow that closely. I have today > looked at it and test it with some of the corpora you have mentioned. > In order to do that, I checkout master to work with from this commit > onwards > > https://github.com/apache/opennlp/commit/ > 56321aab51a470cd2004b76fb1f5330881b943c1 > > 1. I tried to run it from the CLI. The Sentiment component did not > appear to be available. > 2. I added the SentimentTrainer and Evaluator to the cmdline.CLI (no > SentimentTool is implemented to tag with a trained model). > 3. After that, the CLI tests did not pass. So, the CLI is currently > non functional, unless I did something wrong, always possible, of > course. See if you can reproduce that error. > > I therefore did the tests via API. I implemented a little test for > training, evaluating and tagging here: > > https://github.com/ixa-ehu/ixa-pipe-doc/tree/test > > I run the training on the large movies review from Stanford for binary > polarity classification > > http://ai.stanford.edu/~amaas/data/sentiment/ > > and on the two little samples multiclass files added in resources and > mentioned in the previous email, using the first one for training and > the second one for testing (maxent 100 iterations, cutoff 5). > > 2. Stanford results: 0.84264 > 3. sample multiclass: 0.73 > > Given that this is a standard document classification task, I decided > to train the doccat component from the CLI: > > 1. Stanford results: 0.84264 (BOW features by default). > 2. sample multiclass: 0.73 > > I then looked at the code of the sentiment component and saw that it > is basically a document classifier working with bag of words features. > No added functionality. So, my conclusions are: > > 1. The CLI needs to be fixed. > 2. The Sentiment component, as it is, provides the same functionality > as the document classifier. > > I would therefore reconsider this commit until those two issues are > addressed. Just my opinion. > > Best regards, > > Rodrigo > > On Thu, Jun 29, 2017 at 5:30 PM, Chris Mattmann <[email protected]> > wrote: > > > > Hey Joern, > > > > Sure, you can find the model data links here, along with our > evaluation of them. > > > > http://irds.usc.edu/SentimentAnalysisParser/datasets.html > > > > There are other evaluations here: > > > > http://irds.usc.edu/SentimentAnalysisParser/models.html > > > > The HT provider review I cannot contribute at this time and I > question its broad > > applicability since it’s related to human trafficking. In addition > we are still working > > on publishing our analysis & evaluation of it which is why I removed > it from the > > commit. > > > > Cheers, > > Chris > > > > > > > > > > > > On 6/29/17, 7:36 AM, "Joern Kottmann" <[email protected]> wrote: > > > > Which data sets did you use to evaluate this? > > I was looking for a bit more than a sample file to train it. > > > > I noticed that you checked in stanford and netflix models. > > > > The stanford data set is probably this one: > > http://ai.stanford.edu/~amaas/data/sentiment/ > > > > Do you have a link for the netflix data? > > > > Jörn > > > > On Thu, Jun 29, 2017 at 4:00 PM, Chris Mattmann < > [email protected]> wrote: > > > Absolutely you can find it here: > > > > > > > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ > (for categorical /multi-class) > > > > opennlp-tools/src/test/resources/opennlp/tools/sentiment/sample_train_categ2 > (for categorical/multi-class) > > > > > > We can also do similar files where instead of multi-class, we > just use pos/neg as the label. > > > > > > Cheers, > > > Chris > > > > > > > > > > > > > > > > > > On 6/29/17, 2:35 AM, "Joern Kottmann" <[email protected]> > wrote: > > > > > > Hello Chris, > > > > > > could you please point me to files I can use to train the > sentiment > > > component? I am currently looking again through the code > and would > > > like to train it myself. > > > > > > Jörn > > > > > > On Tue, Jun 27, 2017 at 4:59 PM, Dan Russ < > [email protected]> wrote: > > > > Hi All, > > > > First, let me take a share of blame for the comment > Chris mentioned. I believe I said something like the pull request was X > revision behind and Y revisions ahead. It was not meant to be rude, it was > meant to say it is hard to review code when it is so different from the > current code base. I am very excited that sentiment analysis is going to be > added to OpenNLP, but I have not had time to play with it. If I were to say > “great job” before I have add a chance to look at it, it would be flattery > not honest praise. > > > > > > > > Let’s clean up the merge. I agree with Chris that > scalability and perfection should not be our initial goal. Let’s get > something, and we can decide how to optimize later (even if it require a > complete rewrite). Perfection is the enemy of the good. > > > > > > > > Finally, because of Chris’ comments it is hard to > thank Ana and Chris without sounding insincere. But I’ll try, thank you > Chris and Ana. I hope we can get beyond this and that Chris and Ana will > continue to improve the performance of the sentiment analysis tool and > happily remain part of the OpenNLP family. It is also a good time to toss > a big thank you to all of the committers, users, and PMC member. I use > OpenNLP almost everyday. Your work is extremely valuable to me. > > > > > > > > Thank you, > > > > Daniel > > > > > > > >> On Jun 27, 2017, at 10:25 AM, Chris Mattmann < > [email protected]> wrote: > > > >> > > > >> Hi everyone, > > > >> > > > >> I spoke with Joern in Slack. Some of his concerns are: > > > >> > > > >> 1. This was done with a Merge commit and apparently > they squash and rebase. > > > >> [would be helpful to see some pointer on this for > documentation, thus far I > > > >> haven’t found any] > > > >> 2. Apparently we literally need to ask others for +1 > votes and record them > > > >> before committing? I thought since Ana and I are > committers aren were +1, > > > >> and since Joern had been providing feedback (the last > of which was to add > > > >> tests, which we did) that he would be +1 as well (I > guess he is not, and I guess > > > >> formally we need to do a +1 vote even still) > > > >> 3. There was concern about scalability of the code. > > > >> 4. There are thoughts that the code was not perfect yet > (even though it works > > > >> fine in the MEMEX project for Ana and I) > > > >> > > > >> So, Joern has opened up a revert PR. > > > >> > > > >> I suppose I should state I find this process extremely > heavyweight and unwelcoming. > > > >> To me, there should be a modicum of trust for > committers, but I feel like even as a > > > >> committer, I am operating as a “contributor” to the > project. Committer means that > > > >> there is trust to modify the source code base. Of the > issues above, the only one I see > > > >> as a moderate snafu was #1, and frankly if there are > some instructions that show me > > > >> how to do squashing and rebasing *first* I will try to > do that in the future since I am > > > >> not a GIt expert. > > > >> > > > >> That said, I must state I feel pretty put off by Apache > OpenNLP. This originated as a GSoC > > > >> effort, and we have worked pretty consistently on this > over the last year. We used a > > > >> separate GitHub project to get started, kept Joern > involved as another mentor, even > > > >> provided access and commit writes to that GitHub > repository for a long time, so this > > > >> code was developed in the open. Joern even created a > branch in ApacheOpenNLP in the code and I suppose > > > >> I should have gone and worked on that branch first > since master is apparently so > > > >> pristine that even an Apache veteran like me can’t get > something in to it without > > > >> making a whole bunch of (what are IMO minor issues, and > what are IMO heavyweight > > > >> “community” issues). > > > >> > > > >> I am concerned from a community point of view that the > first comment wasn’t “Great > > > >> job Chris, you got Sentiment Analysis into Apache, > *but* I have these concerns 1-4 above”. > > > >> It was “The PR was merged wrong in ways 1-4 and I’m > going to revert it.” > > > >> > > > >> That’s pretty off-putting to someone who is semi-new > like me and like Ana. > > > >> > > > >> Anyways, go ahead and revert it. Sorry to have caused > any issues. > > > >> > > > >> Chris > > > >> > > > >> > > > >> > > > >> On 6/27/17, 7:06 AM, "Chris Mattmann" < > [email protected]> wrote: > > > >> > > > >> Hi Joern, > > > >> > > > >> I’m confused. Why did you revert my commit? > > > >> > > > >> Every one of those check points you put on the PR > was checked? > > > >> We have been discussing this for months, you have > seen the > > > >> code for months, Ana and I have worked diligently on > the code > > > >> in plain view of everyone. > > > >> > > > >> Please explain. > > > >> > > > >> Chris > > > >> > > > >> > > > >> > > > >> > > > >> On 6/27/17, 1:23 AM, "kottmann" <[email protected]> > wrote: > > > >> > > > >> GitHub user kottmann opened a pull request: > > > >> > > > >> https://github.com/apache/opennlp/pull/238 > > > >> > > > >> Revert merging of sentiment work, no consent > to merge it > > > >> > > > >> Thank you for contributing to Apache OpenNLP. > > > >> > > > >> In order to streamline the review of the > contribution we ask you > > > >> to ensure the following steps have been > taken: > > > >> > > > >> ### For all changes: > > > >> - [ ] Is there a JIRA ticket associated with > this PR? Is it referenced > > > >> in the commit message? > > > >> > > > >> - [ ] Does your PR title start with > OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay > particular attention to the hyphen "-" character. > > > >> > > > >> - [ ] Has your PR been rebased against the > latest commit within the target branch (typically master)? > > > >> > > > >> - [ ] Is your initial contribution a single, > squashed commit? > > > >> > > > >> ### For code changes: > > > >> - [ ] Have you ensured that the full suite > of tests is executed via mvn clean install at the root opennlp folder? > > > >> - [ ] Have you written or updated unit tests > to verify your changes? > > > >> - [ ] If adding new dependencies to the > code, are these dependencies licensed in a way that is compatible for > inclusion under [ASF 2.0](http://www.apache.org/ > legal/resolved.html#category-a)? > > > >> - [ ] If applicable, have you updated the > LICENSE file, including the main LICENSE file in opennlp folder? > > > >> - [ ] If applicable, have you updated the > NOTICE file, including the main NOTICE file found in opennlp folder? > > > >> > > > >> ### For documentation related changes: > > > >> - [ ] Have you ensured that format looks > appropriate for the output in which it is rendered? > > > >> > > > >> ### Note: > > > >> Please ensure that once the PR is submitted, > you check travis-ci for build issues and submit an update to your PR as > soon as possible. > > > >> > > > >> > > > >> You can merge this pull request into a Git > repository by running: > > > >> > > > >> $ git pull https://github.com/kottmann/ > opennlp revert_sentiment > > > >> > > > >> Alternatively you can review and apply these > changes as the patch at: > > > >> > > > >> https://github.com/apache/ > opennlp/pull/238.patch > > > >> > > > >> To close this pull request, make a commit to > your master/trunk branch > > > >> with (at least) the following in the commit > message: > > > >> > > > >> This closes #238 > > > >> > > > >> ---- > > > >> commit 123222eb34724bae793e9d6d22e202c0aee0aa45 > > > >> Author: Jörn Kottmann <[email protected]> > > > >> Date: 2017-06-27T08:19:19Z > > > >> > > > >> Revert merging of sentiment work, no consent > to merge it > > > >> > > > >> ---- > > > >> > > > >> > > > >> --- > > > >> If your project is set up for it, you can reply > to this email and have your > > > >> reply appear on GitHub as well. If your project > does not have this feature > > > >> enabled and wishes so, or if the feature is > enabled but not working, please > > > >> contact infrastructure at > [email protected] or file a JIRA ticket > > > >> with INFRA. > > > >> --- > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > > > > > > > > > > > >
