Re: Word Sense Disambiguator

Anthony Beylerian Tue, 08 Sep 2015 08:47:57 -0700

Hi,

Thanks it is still being improved.


I am not sure what you mean by type or database ID.
Currently the sense source and the sense ID are returned.

For example:

"I went to the bank to deposit money."
target : bank (index : 4)
expected output : [WORDNET bank%1:14:00:: 21.6, WORDNET bank%1:21:01::
5.8,... etc]

Where "bank%1:14:00::" is a SenseKey which you can query WordNet with to
give you a sense definition.

You can do this using the default dictionary :
Dictionary.getDefaultResourceInstance().getWordBySenseKey(sensekey).getSynset().getGloss();

Hope this is what you are looking for, otherwise please clarify.

Anthony Beylerian

On Tue, Sep 8, 2015 at 5:34 PM, Cristian Petroaca <
cristian.petro...@gmail.com> wrote:

> Hi Anthony,
>
> I had a chance to test the wsd component. That's great work. Thanks.
> One question, is it possible to return the wordnet type (or database id) of
> the disambiguated word?
>
> Thanks,
> Cristian
>
> On Fri, Jul 24, 2015 at 1:14 PM, Anthony Beylerian <
> anthonybeyler...@hotmail.com> wrote:
>
> > Hi,
> >
> > To try out the ongoing implementations, after checking out the sandbox
> > repository please try these steps :
> > 1- Create a resource models directory:
> >
> > - src
> >   - test
> >     - resources
> >       + models
> >
> > 2- Include the following pre-trained models and dictionary in that
> > directory:
> > You can find those here [1] if you like or pre-train your own models.
> >
> > {
> > en-token.bin,
> > en-pos-maxent.bin,
> > en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
> > }
> >
> > As to train the IMS approach you need to include training data like
> > senseval3 [2]:
> > For now, please add these folders :
> > - src
> >   - test
> >     - resources
> >        - supervised
> >          + raw
> >          + models
> >          + dictionary
> >
> > You can find the data files here [2].
> >
> > 3- We included two examples [LeskTester.java] and [IMSTester.java] that
> > you can run directly, or make your own tests.
> >
> > To run a custom test, minimally you need to have a tokenized text or
> > sentence  for example for Lesk:
> >
> >           1>> String[] words = Loader.getTokenizer().tokenize(sentence);
> >
> > Chose the index of the word to disambiguate in the token array.
> >
> >           2>> int wordIndex= 6;
> >
> > Then just create a WSDisambiguator object for example for Lesk :
> >
> >          3>> Lesk lesk = new Lesk();
> >
> > And you can call the default disambiguation method
> >
> >          4>> lesk.disambiguate(words,wordIndex);
> >
> > You will get an array of strings with the following format :
> >
> > Lesk : [Source SenseKey Score]
> >
> > To read the sense definitions you can use the method :
> > [opennlp.tools.disambiguator.Constants.printResults]
> >
> > For using the variations of Lesk, you will need to create and configure a
> > parameters object:
> >           5>> LeskParameters leskParams = new LeskParameters();
> > 6>>
> > leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);
> >       7>> leskParams.setWin_b_size(4);          8>>
> > leskParams.setDepth(3);          9>> lesk.setParams(leskParams);
> >
> > Typically, IMS should perform better than Lesk, since Lesk is a classic
> > method but it usually used as a baseline along with the most frequent
> sense
> > (MFS).
> > However, we will be testing and adding more techniques.
> >
> > In any case, please feel free to ask for more details.
> >
> > Best,
> >
> > Anthony
> >
> > [1] :
> >
> https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing
> > [2] :
> >
> https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
> > > Date: Fri, 24 Jul 2015 09:54:02 +0200
> > > Subject: Re: Word Sense Disambiguator
> > > From: kottm...@gmail.com
> > > To: dev@opennlp.apache.org
> > >
> > > It would be nice if you could share instructions on how to run it.
> > > I also would like to give it a try.
> > >
> > > Jörn
> > >
> > > On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian <
> > > anthonybeyler...@hotmail.com> wrote:
> > >
> > > > Hello,
> > > > Yes for the moment we are only using WordNet for sense
> definitions.The
> > > > plan is to complete the package by mid to late August, but if you
> like
> > you
> > > > can follow up on the progress from the sandbox.
> > > > Best regards,
> > > > Anthony
> > > > > Date: Thu, 23 Jul 2015 15:36:57 +0300
> > > > > Subject: Word Sense Disambiguator
> > > > > From: cristian.petro...@gmail.com
> > > > > To: dev@opennlp.apache.org
> > > > >
> > > > > Hi,
> > > > >
> > > > > I saw that there are people actively working on a Word Sense
> > > > Disambiguator.
> > > > > DO you guys know when will the module be ready to use? Also I
> assume
> > that
> > > > > wordnet is used to define the disambiguated word meaning?
> > > > >
> > > > > Thanks,
> > > > > Cristian
> > > >
> > > >
> >
> >
>

Re: Word Sense Disambiguator

Reply via email to