This sounds like a fantastic idea.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 6/21/16, 12:13 AM, "Anthony Beylerian" <anthonybeyler...@hotmail.com> wrote:

>+1 
>
>Maybe we could put the results of the evaluator tests for each component 
>somewhere on a webpage and on every release update them.
>This is of course provided there are reasonable data sets for testing each 
>component.
>What do you think?
>
>Anthony
>
>> From: mondher.bouaz...@gmail.com
>> Date: Tue, 21 Jun 2016 15:59:47 +0900
>> Subject: Re: Performances of OpenNLP tools
>> To: dev@opennlp.apache.org
>> 
>> Hi,
>> 
>> Thank you for your replies.
>> 
>> Please Jeffrey accept once more my apologies for receiving the email twice.
>> 
>> I also think it would be great to have such studies on the performances of
>> OpenNLP.
>> 
>> I have been looking for this information and checked in many places,
>> including obviously google scholar, and I haven't found any serious studies
>> or reliable results. Most of the existing ones report the performances of
>> outdated releases of OpenNLP, and focus more on the execution time or
>> CPU/RAM consumption, etc.
>> 
>> I think such a comparison will help not only evaluate the overall accuracy,
>> but also highlight the issues with the existing models (as a matter of
>> fact, the existing models fail to recognize many of the hashtags in tweets:
>> the tokenizer splits them into the "#" symbol and a word that the PoS
>> tagger also fails to recognize).
>> 
>> Therefore, building Twitter-based models would also be useful, since many
>> of the works in academia / industry are focusing on Twitter data.
>> 
>> Best regards,
>> 
>> Mondher
>> 
>> 
>> 
>> On Tue, Jun 21, 2016 at 12:45 AM, Jason Baldridge <jasonbaldri...@gmail.com>
>> wrote:
>> 
>> > It would be fantastic to have these numbers. This is an example of
>> > something that would be a great contribution by someone trying to
>> > contribute to open source and who is maybe just getting into machine
>> > learning and natural language processing.
>> >
>> > For Twitter-ish text, it'd be great to look at models trained and evaluated
>> > on the Tweet NLP resources:
>> >
>> > http://www.cs.cmu.edu/~ark/TweetNLP/
>> >
>> > And comparing to how their models performed, etc. Also, it's worth looking
>> > at spaCy (Python NLP modules) for further comparisons.
>> >
>> > https://spacy.io/
>> >
>> > -Jason
>> >
>> > On Mon, 20 Jun 2016 at 10:41 Jeffrey Zemerick <jzemer...@apache.org>
>> > wrote:
>> >
>> > > I saw the same question on the users list on June 17. At least I thought
>> > it
>> > > was the same question -- sorry if it wasn't.
>> > >
>> > > On Mon, Jun 20, 2016 at 11:37 AM, Mattmann, Chris A (3980) <
>> > > chris.a.mattm...@jpl.nasa.gov> wrote:
>> > >
>> > > > Well, hold on. He sent that mail (as of the time of this mail) 4
>> > > > mins previously. Maybe some folks need some time to reply ^_^
>> > > >
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > > Chris Mattmann, Ph.D.
>> > > > Chief Architect
>> > > > Instrument Software and Science Data Systems Section (398)
>> > > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > > Office: 168-519, Mailstop: 168-527
>> > > > Email: chris.a.mattm...@nasa.gov
>> > > > WWW:  http://sunset.usc.edu/~mattmann/
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > > Director, Information Retrieval and Data Science Group (IRDS)
>> > > > Adjunct Associate Professor, Computer Science Department
>> > > > University of Southern California, Los Angeles, CA 90089 USA
>> > > > WWW: http://irds.usc.edu/
>> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On 6/20/16, 8:23 AM, "Jeffrey Zemerick" <jzemer...@apache.org> wrote:
>> > > >
>> > > > >Hi Mondher,
>> > > > >
>> > > > >Since you didn't get any replies I'm guessing no one is aware of any
>> > > > >resources related to what you need. Google Scholar is a good place to
>> > > look
>> > > > >for papers referencing OpenNLP and its methods (in case you haven't
>> > > > >searched it already).
>> > > > >
>> > > > >Jeff
>> > > > >
>> > > > >On Mon, Jun 20, 2016 at 11:19 AM, Mondher Bouazizi <
>> > > > >mondher.bouaz...@gmail.com> wrote:
>> > > > >
>> > > > >> Hi,
>> > > > >>
>> > > > >> Apologies if you received multiple copies of this email. I sent it
>> > to
>> > > > the
>> > > > >> users list a while ago, and haven't had an answer yet.
>> > > > >>
>> > > > >> I have been looking for a while if there is any relevant work that
>> > > > >> performed tests on the OpenNLP tools (in particular the Lemmatizer,
>> > > > >> Tokenizer and PoS-Tagger) when used with short and noisy texts such
>> > as
>> > > > >> Twitter data, etc., and/or compared it to other libraries.
>> > > > >>
>> > > > >> By performances, I mean accuracy/precision, rather than time of
>> > > > execution,
>> > > > >> etc.
>> > > > >>
>> > > > >> If anyone can refer me to a paper or a work done in this context,
>> > that
>> > > > >> would be of great help.
>> > > > >>
>> > > > >> Thank you very much.
>> > > > >>
>> > > > >> Mondher
>> > > > >>
>> > > >
>> > >
>> >
>                                         

Reply via email to