It would be fantastic to have these numbers. This is an example of something that would be a great contribution by someone trying to contribute to open source and who is maybe just getting into machine learning and natural language processing.
For Twitter-ish text, it'd be great to look at models trained and evaluated on the Tweet NLP resources: http://www.cs.cmu.edu/~ark/TweetNLP/ And comparing to how their models performed, etc. Also, it's worth looking at spaCy (Python NLP modules) for further comparisons. https://spacy.io/ -Jason On Mon, 20 Jun 2016 at 10:41 Jeffrey Zemerick <jzemer...@apache.org> wrote: > I saw the same question on the users list on June 17. At least I thought it > was the same question -- sorry if it wasn't. > > On Mon, Jun 20, 2016 at 11:37 AM, Mattmann, Chris A (3980) < > chris.a.mattm...@jpl.nasa.gov> wrote: > > > Well, hold on. He sent that mail (as of the time of this mail) 4 > > mins previously. Maybe some folks need some time to reply ^_^ > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Chief Architect > > Instrument Software and Science Data Systems Section (398) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > > > > > > > > > > > On 6/20/16, 8:23 AM, "Jeffrey Zemerick" <jzemer...@apache.org> wrote: > > > > >Hi Mondher, > > > > > >Since you didn't get any replies I'm guessing no one is aware of any > > >resources related to what you need. Google Scholar is a good place to > look > > >for papers referencing OpenNLP and its methods (in case you haven't > > >searched it already). > > > > > >Jeff > > > > > >On Mon, Jun 20, 2016 at 11:19 AM, Mondher Bouazizi < > > >mondher.bouaz...@gmail.com> wrote: > > > > > >> Hi, > > >> > > >> Apologies if you received multiple copies of this email. I sent it to > > the > > >> users list a while ago, and haven't had an answer yet. > > >> > > >> I have been looking for a while if there is any relevant work that > > >> performed tests on the OpenNLP tools (in particular the Lemmatizer, > > >> Tokenizer and PoS-Tagger) when used with short and noisy texts such as > > >> Twitter data, etc., and/or compared it to other libraries. > > >> > > >> By performances, I mean accuracy/precision, rather than time of > > execution, > > >> etc. > > >> > > >> If anyone can refer me to a paper or a work done in this context, that > > >> would be of great help. > > >> > > >> Thank you very much. > > >> > > >> Mondher > > >> > > >