It would be fantastic to have these numbers. This is an example of
something that would be a great contribution by someone trying to
contribute to open source and who is maybe just getting into machine
learning and natural language processing.

For Twitter-ish text, it'd be great to look at models trained and evaluated
on the Tweet NLP resources:

http://www.cs.cmu.edu/~ark/TweetNLP/

And comparing to how their models performed, etc. Also, it's worth looking
at spaCy (Python NLP modules) for further comparisons.

https://spacy.io/

-Jason

On Mon, 20 Jun 2016 at 10:41 Jeffrey Zemerick <jzemer...@apache.org> wrote:

> I saw the same question on the users list on June 17. At least I thought it
> was the same question -- sorry if it wasn't.
>
> On Mon, Jun 20, 2016 at 11:37 AM, Mattmann, Chris A (3980) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Well, hold on. He sent that mail (as of the time of this mail) 4
> > mins previously. Maybe some folks need some time to reply ^_^
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398)
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: chris.a.mattm...@nasa.gov
> > WWW:  http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On 6/20/16, 8:23 AM, "Jeffrey Zemerick" <jzemer...@apache.org> wrote:
> >
> > >Hi Mondher,
> > >
> > >Since you didn't get any replies I'm guessing no one is aware of any
> > >resources related to what you need. Google Scholar is a good place to
> look
> > >for papers referencing OpenNLP and its methods (in case you haven't
> > >searched it already).
> > >
> > >Jeff
> > >
> > >On Mon, Jun 20, 2016 at 11:19 AM, Mondher Bouazizi <
> > >mondher.bouaz...@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> Apologies if you received multiple copies of this email. I sent it to
> > the
> > >> users list a while ago, and haven't had an answer yet.
> > >>
> > >> I have been looking for a while if there is any relevant work that
> > >> performed tests on the OpenNLP tools (in particular the Lemmatizer,
> > >> Tokenizer and PoS-Tagger) when used with short and noisy texts such as
> > >> Twitter data, etc., and/or compared it to other libraries.
> > >>
> > >> By performances, I mean accuracy/precision, rather than time of
> > execution,
> > >> etc.
> > >>
> > >> If anyone can refer me to a paper or a work done in this context, that
> > >> would be of great help.
> > >>
> > >> Thank you very much.
> > >>
> > >> Mondher
> > >>
> >
>

Reply via email to