Re: Performances of OpenNLP tools

Mondher Bouazizi Tue, 21 Jun 2016 00:01:19 -0700

Hi,

Thank you for your replies.


Please Jeffrey accept once more my apologies for receiving the email twice.

I also think it would be great to have such studies on the performances of
OpenNLP.

I have been looking for this information and checked in many places,
including obviously google scholar, and I haven't found any serious studies
or reliable results. Most of the existing ones report the performances of
outdated releases of OpenNLP, and focus more on the execution time or
CPU/RAM consumption, etc.

I think such a comparison will help not only evaluate the overall accuracy,
but also highlight the issues with the existing models (as a matter of
fact, the existing models fail to recognize many of the hashtags in tweets:
the tokenizer splits them into the "#" symbol and a word that the PoS
tagger also fails to recognize).

Therefore, building Twitter-based models would also be useful, since many
of the works in academia / industry are focusing on Twitter data.

Best regards,

Mondher



On Tue, Jun 21, 2016 at 12:45 AM, Jason Baldridge <[email protected]>
wrote:

> It would be fantastic to have these numbers. This is an example of
> something that would be a great contribution by someone trying to
> contribute to open source and who is maybe just getting into machine
> learning and natural language processing.
>
> For Twitter-ish text, it'd be great to look at models trained and evaluated
> on the Tweet NLP resources:
>
> http://www.cs.cmu.edu/~ark/TweetNLP/
>
> And comparing to how their models performed, etc. Also, it's worth looking
> at spaCy (Python NLP modules) for further comparisons.
>
> https://spacy.io/
>
> -Jason
>
> On Mon, 20 Jun 2016 at 10:41 Jeffrey Zemerick <[email protected]>
> wrote:
>
> > I saw the same question on the users list on June 17. At least I thought
> it
> > was the same question -- sorry if it wasn't.
> >
> > On Mon, Jun 20, 2016 at 11:37 AM, Mattmann, Chris A (3980) <
> > [email protected]> wrote:
> >
> > > Well, hold on. He sent that mail (as of the time of this mail) 4
> > > mins previously. Maybe some folks need some time to reply ^_^
> > >
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Chris Mattmann, Ph.D.
> > > Chief Architect
> > > Instrument Software and Science Data Systems Section (398)
> > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > Office: 168-519, Mailstop: 168-527
> > > Email: [email protected]
> > > WWW:  http://sunset.usc.edu/~mattmann/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > Director, Information Retrieval and Data Science Group (IRDS)
> > > Adjunct Associate Professor, Computer Science Department
> > > University of Southern California, Los Angeles, CA 90089 USA
> > > WWW: http://irds.usc.edu/
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 6/20/16, 8:23 AM, "Jeffrey Zemerick" <[email protected]> wrote:
> > >
> > > >Hi Mondher,
> > > >
> > > >Since you didn't get any replies I'm guessing no one is aware of any
> > > >resources related to what you need. Google Scholar is a good place to
> > look
> > > >for papers referencing OpenNLP and its methods (in case you haven't
> > > >searched it already).
> > > >
> > > >Jeff
> > > >
> > > >On Mon, Jun 20, 2016 at 11:19 AM, Mondher Bouazizi <
> > > >[email protected]> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Apologies if you received multiple copies of this email. I sent it
> to
> > > the
> > > >> users list a while ago, and haven't had an answer yet.
> > > >>
> > > >> I have been looking for a while if there is any relevant work that
> > > >> performed tests on the OpenNLP tools (in particular the Lemmatizer,
> > > >> Tokenizer and PoS-Tagger) when used with short and noisy texts such
> as
> > > >> Twitter data, etc., and/or compared it to other libraries.
> > > >>
> > > >> By performances, I mean accuracy/precision, rather than time of
> > > execution,
> > > >> etc.
> > > >>
> > > >> If anyone can refer me to a paper or a work done in this context,
> that
> > > >> would be of great help.
> > > >>
> > > >> Thank you very much.
> > > >>
> > > >> Mondher
> > > >>
> > >
> >
>

Re: Performances of OpenNLP tools

Reply via email to