RE: Word Sense Disambiguator

Anthony Beylerian Fri, 24 Jul 2015 03:15:15 -0700

Hi,

To try out the ongoing implementations, after checking out the sandbox 
repository please try these steps :
1- Create a resource models directory:


- src
  - test
    - resources
      + models

2- Include the following pre-trained models and dictionary in that directory:
You can find those here [1] if you like or pre-train your own models.

{
en-token.bin,
en-pos-maxent.bin,
en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
}

As to train the IMS approach you need to include training data like senseval3 
[2]:
For now, please add these folders :
- src
  - test
    - resources
       - supervised
         + raw
         + models
         + dictionary

You can find the data files here [2].

3- We included two examples [LeskTester.java] and [IMSTester.java] that you can 
run directly, or make your own tests.

To run a custom test, minimally you need to have a tokenized text or sentence  
for example for Lesk:

          1>> String[] words = Loader.getTokenizer().tokenize(sentence);

Chose the index of the word to disambiguate in the token array.

          2>> int wordIndex= 6;

Then just create a WSDisambiguator object for example for Lesk :

         3>> Lesk lesk = new Lesk();

And you can call the default disambiguation method 

         4>> lesk.disambiguate(words,wordIndex);

You will get an array of strings with the following format : 

Lesk : [Source SenseKey Score]   

To read the sense definitions you can use the method :
[opennlp.tools.disambiguator.Constants.printResults]

For using the variations of Lesk, you will need to create and configure a 
parameters object:
          5>> LeskParameters leskParams = new LeskParameters();          6>> 
leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);        
  7>> leskParams.setWin_b_size(4);          8>> leskParams.setDepth(3);         
 9>> lesk.setParams(leskParams);

Typically, IMS should perform better than Lesk, since Lesk is a classic method 
but it usually used as a baseline along with the most frequent sense (MFS).
However, we will be testing and adding more techniques.

In any case, please feel free to ask for more details.

Best,

Anthony

[1] : 
https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing
[2] : 
https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
> Date: Fri, 24 Jul 2015 09:54:02 +0200
> Subject: Re: Word Sense Disambiguator
> From: kottm...@gmail.com
> To: dev@opennlp.apache.org
> 
> It would be nice if you could share instructions on how to run it.
> I also would like to give it a try.
> 
> Jörn
> 
> On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian <
> anthonybeyler...@hotmail.com> wrote:
> 
> > Hello,
> > Yes for the moment we are only using WordNet for sense definitions.The
> > plan is to complete the package by mid to late August, but if you like you
> > can follow up on the progress from the sandbox.
> > Best regards,
> > Anthony
> > > Date: Thu, 23 Jul 2015 15:36:57 +0300
> > > Subject: Word Sense Disambiguator
> > > From: cristian.petro...@gmail.com
> > > To: dev@opennlp.apache.org
> > >
> > > Hi,
> > >
> > > I saw that there are people actively working on a Word Sense
> > Disambiguator.
> > > DO you guys know when will the module be ready to use? Also I assume that
> > > wordnet is used to define the disambiguated word meaning?
> > >
> > > Thanks,
> > > Cristian
> >
> >

RE: Word Sense Disambiguator

Reply via email to