Re: Sentiment Analysis Parser updates

Mattmann, Chris A (3980) Tue, 28 Jun 2016 14:28:33 -0700

Thanks William, this is a great idea. I will discuss it with 
Anastasija tomorrow.



Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 6/28/16, 12:01 PM, "William Colen" <[email protected]> wrote:

>Hi,
>
>I tried your code. Very good work so far! Congratulations.
>
>Is the examples/result file corrupted? It has only one line.
>
>Do you plan to implement a simple CLI to use it interactively from command
>line, similar to
>
>bin/opennlp Doccat
>bin/opennlp TokenNameFinder
>
>?
>
>Also, do you plan to add evaluation tools by extending
>AbstractEvaluatorTool and AbstractCrossValidatorTool, as well as the
>listener EvaluationErrorPrinter? I found these tools very useful while I am
>developing new models and features, maybe you would find it useful as well.
>
>You could also check the DoccatFineGrainedReportListener as a start point
>to create a confusion matrix (I think it would be easy because Doccat data
>structures are similar to yours).
>
>The result would look like the follow (this is a 300 entries Portuguese
>corpus I am building from Facebook messages):
>
>
>=== Evaluation summary ===
>  Number of documents:    298
>    Min sentence size:      1
>    Max sentence size:    463
>Average sentence size:  18,01
>     Categories count:      4
>             Accuracy: 61,41%
>
>=== Detailed Accuracy By Tag ===
>
>-------------------------------------------------------------------------
>|      Tag | Errors |  Count |   % Err | Precision | Recall | F-Measure |
>-------------------------------------------------------------------------
>|  neutral |     46 |     56 | 0,821   | 0,588     | 0,179  | 0,274     |
>| positive |     46 |     70 | 0,657   | 0,48      | 0,343  | 0,4       |
>| negative |     18 |    167 | 0,108   | 0,651     | 0,892  | 0,753     |
>|     spam |      5 |      5 | 1       | 0         | 0      | 0         |
>-------------------------------------------------------------------------
>
>=== Confusion matrix ===
>
>
>    a     b     c     d | Accuracy | <-- classified as
> <149>   13     4     1 |   89,22% |   a = negative
>   42   <24>    3     1 |   34,29% |   b = positive
>   35    11   <10>    . |   17,86% |   c = neutral
>    3     2     .    <.>|   0%     |   d = spam
>
>
>
>
>Regards,
>William
>
>2016-06-23 2:11 GMT-03:00 Mattmann, Chris A (3980) <
>[email protected]>:
>
>> Thank you Jason!
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/22/16, 8:41 PM, "Jason Baldridge" <[email protected]> wrote:
>>
>> >Anastasija,
>> >
>> >There might be a few appropriate sentiment datasets listed in my homework
>> >on Twitter sentiment analysis:
>> >
>> >https://github.com/utcompling/applied-nlp/wiki/Homework5
>> >
>> >There may also be some useful data sets in the Crowdflower Open Data
>> >collection:
>> >
>> >https://www.crowdflower.com/data-for-everyone/
>> >
>> >Hope this helps!
>> >
>> >-Jason
>> >
>> >On Wed, 22 Jun 2016 at 15:59 Anastasija Mensikova <
>> >[email protected]> wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> Some updates on our Sentiment Analysis Parser work.
>> >>
>> >> You might have noticed, I have enhanced our website (the GH page)
>> recently,
>> >> polished it and made it more user-friendly. My next step will be
>> sending a
>> >> pull request to Tika. However, my main goal until the end of Google
>> Summer
>> >> of Code is to enhance the parser in a way that will allow it to work
>> >> categorically (in other words, the sentiment determined won't be just
>> >> positive or negative, it will have a few categories). This means that my
>> >> next step is to look for a categorical open data set (which I will
>> >> hopefully do by the end of the weekend the latest) and, of course,
>> enhance
>> >> my model and training. After that I will look into how the confidence
>> >> levels can be increased.
>> >>
>> >> Have a great day/night!
>> >>
>> >> Thank you,
>> >> Anastasija Mensikova.
>> >>
>>

Re: Sentiment Analysis Parser updates

Reply via email to