Re: Word Sense Disambiguator

Cristian Petroaca Thu, 17 Sep 2015 07:19:04 -0700

Hi Anthony,

Do you know when will the WSD component be available in an OpenNLP release?


Thanks,
Cristian

On Thu, Sep 10, 2015 at 10:32 AM, Cristian Petroaca <
cristian.petro...@gmail.com> wrote:

> Yes, that's what I was looking for.
> Thanks Aliaksandr.
>
> On Wed, Sep 9, 2015 at 9:39 PM, Aliaksandr Autayeu <aliaksa...@autayeu.com
> > wrote:
>
>> Cristian, the reference you gave basically uses synset offsets - 1740 is
>> entity, 1930 is physical entity, etc. However, in YAGO they seems to have
>> added 100000000 to those offsets.
>>
>> Synset offset is the fastest way to get into WordNet dictionary, because
>> it
>> is a direct file offset. Offset alone is not enough though, you also need
>> POS - part of speech. Speed probably is the reason most people access
>> WordNet this way. However, offset is not the best "key", especially for
>> indexing, because offsets change as WordNet evolves. SenseKeys (e.g.
>> bank%1:14:00::
>> and bank%1:21:01::) should be more suitable for indexing.
>>
>> If you're looking to connect with YAGO above, you might do something along
>> the lines of
>> ....getWordBySenseKey(sensekey).getSynset().getOffset and then add
>> 100000000
>> to get the YAGO ids.
>>
>> Aliaksandr
>>
>>
>> On 9 September 2015 at 09:51, Cristian Petroaca <
>> cristian.petro...@gmail.com
>> > wrote:
>>
>> > I am looking for the Sense Id of the word. It has this format here :
>> >
>> >
>> http://resources.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoWordnetIds.txt
>> >
>> >
>> > On Tue, Sep 8, 2015 at 6:47 PM, Anthony Beylerian <
>> > anthony.beyler...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > Thanks it is still being improved.
>> > >
>> > > I am not sure what you mean by type or database ID.
>> > > Currently the sense source and the sense ID are returned.
>> > >
>> > > For example:
>> > >
>> > > "I went to the bank to deposit money."
>> > > target : bank (index : 4)
>> > > expected output : [WORDNET bank%1:14:00:: 21.6, WORDNET bank%1:21:01::
>> > > 5.8,... etc]
>> > >
>> > > Where "bank%1:14:00::" is a SenseKey which you can query WordNet with
>> to
>> > > give you a sense definition.
>> > >
>> > > You can do this using the default dictionary :
>> > >
>> > >
>> >
>> Dictionary.getDefaultResourceInstance().getWordBySenseKey(sensekey).getSynset().getGloss();
>> > >
>> > > Hope this is what you are looking for, otherwise please clarify.
>> > >
>> > > Anthony Beylerian
>> > >
>> > > On Tue, Sep 8, 2015 at 5:34 PM, Cristian Petroaca <
>> > > cristian.petro...@gmail.com> wrote:
>> > >
>> > > > Hi Anthony,
>> > > >
>> > > > I had a chance to test the wsd component. That's great work. Thanks.
>> > > > One question, is it possible to return the wordnet type (or database
>> > id)
>> > > of
>> > > > the disambiguated word?
>> > > >
>> > > > Thanks,
>> > > > Cristian
>> > > >
>> > > > On Fri, Jul 24, 2015 at 1:14 PM, Anthony Beylerian <
>> > > > anthonybeyler...@hotmail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > To try out the ongoing implementations, after checking out the
>> > sandbox
>> > > > > repository please try these steps :
>> > > > > 1- Create a resource models directory:
>> > > > >
>> > > > > - src
>> > > > >   - test
>> > > > >     - resources
>> > > > >       + models
>> > > > >
>> > > > > 2- Include the following pre-trained models and dictionary in that
>> > > > > directory:
>> > > > > You can find those here [1] if you like or pre-train your own
>> models.
>> > > > >
>> > > > > {
>> > > > > en-token.bin,
>> > > > > en-pos-maxent.bin,
>> > > > > en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
>> > > > > }
>> > > > >
>> > > > > As to train the IMS approach you need to include training data
>> like
>> > > > > senseval3 [2]:
>> > > > > For now, please add these folders :
>> > > > > - src
>> > > > >   - test
>> > > > >     - resources
>> > > > >        - supervised
>> > > > >          + raw
>> > > > >          + models
>> > > > >          + dictionary
>> > > > >
>> > > > > You can find the data files here [2].
>> > > > >
>> > > > > 3- We included two examples [LeskTester.java] and [IMSTester.java]
>> > that
>> > > > > you can run directly, or make your own tests.
>> > > > >
>> > > > > To run a custom test, minimally you need to have a tokenized text
>> or
>> > > > > sentence  for example for Lesk:
>> > > > >
>> > > > >           1>> String[] words =
>> > > Loader.getTokenizer().tokenize(sentence);
>> > > > >
>> > > > > Chose the index of the word to disambiguate in the token array.
>> > > > >
>> > > > >           2>> int wordIndex= 6;
>> > > > >
>> > > > > Then just create a WSDisambiguator object for example for Lesk :
>> > > > >
>> > > > >          3>> Lesk lesk = new Lesk();
>> > > > >
>> > > > > And you can call the default disambiguation method
>> > > > >
>> > > > >          4>> lesk.disambiguate(words,wordIndex);
>> > > > >
>> > > > > You will get an array of strings with the following format :
>> > > > >
>> > > > > Lesk : [Source SenseKey Score]
>> > > > >
>> > > > > To read the sense definitions you can use the method :
>> > > > > [opennlp.tools.disambiguator.Constants.printResults]
>> > > > >
>> > > > > For using the variations of Lesk, you will need to create and
>> > > configure a
>> > > > > parameters object:
>> > > > >           5>> LeskParameters leskParams = new LeskParameters();
>> > > > > 6>>
>> > > > >
>> > >
>> leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);
>> > > > >       7>> leskParams.setWin_b_size(4);          8>>
>> > > > > leskParams.setDepth(3);          9>> lesk.setParams(leskParams);
>> > > > >
>> > > > > Typically, IMS should perform better than Lesk, since Lesk is a
>> > classic
>> > > > > method but it usually used as a baseline along with the most
>> frequent
>> > > > sense
>> > > > > (MFS).
>> > > > > However, we will be testing and adding more techniques.
>> > > > >
>> > > > > In any case, please feel free to ask for more details.
>> > > > >
>> > > > > Best,
>> > > > >
>> > > > > Anthony
>> > > > >
>> > > > > [1] :
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing
>> > > > > [2] :
>> > > > >
>> > > >
>> > >
>> >
>> https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
>> > > > > > Date: Fri, 24 Jul 2015 09:54:02 +0200
>> > > > > > Subject: Re: Word Sense Disambiguator
>> > > > > > From: kottm...@gmail.com
>> > > > > > To: dev@opennlp.apache.org
>> > > > > >
>> > > > > > It would be nice if you could share instructions on how to run
>> it.
>> > > > > > I also would like to give it a try.
>> > > > > >
>> > > > > > Jörn
>> > > > > >
>> > > > > > On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian <
>> > > > > > anthonybeyler...@hotmail.com> wrote:
>> > > > > >
>> > > > > > > Hello,
>> > > > > > > Yes for the moment we are only using WordNet for sense
>> > > > definitions.The
>> > > > > > > plan is to complete the package by mid to late August, but if
>> you
>> > > > like
>> > > > > you
>> > > > > > > can follow up on the progress from the sandbox.
>> > > > > > > Best regards,
>> > > > > > > Anthony
>> > > > > > > > Date: Thu, 23 Jul 2015 15:36:57 +0300
>> > > > > > > > Subject: Word Sense Disambiguator
>> > > > > > > > From: cristian.petro...@gmail.com
>> > > > > > > > To: dev@opennlp.apache.org
>> > > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I saw that there are people actively working on a Word Sense
>> > > > > > > Disambiguator.
>> > > > > > > > DO you guys know when will the module be ready to use? Also
>> I
>> > > > assume
>> > > > > that
>> > > > > > > > wordnet is used to define the disambiguated word meaning?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Cristian
>> > > > > > >
>> > > > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Word Sense Disambiguator

Reply via email to