Hi Anthony, Do you know when will the WSD component be available in an OpenNLP release?
Thanks, Cristian On Thu, Sep 10, 2015 at 10:32 AM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Yes, that's what I was looking for. > Thanks Aliaksandr. > > On Wed, Sep 9, 2015 at 9:39 PM, Aliaksandr Autayeu <aliaksa...@autayeu.com > > wrote: > >> Cristian, the reference you gave basically uses synset offsets - 1740 is >> entity, 1930 is physical entity, etc. However, in YAGO they seems to have >> added 100000000 to those offsets. >> >> Synset offset is the fastest way to get into WordNet dictionary, because >> it >> is a direct file offset. Offset alone is not enough though, you also need >> POS - part of speech. Speed probably is the reason most people access >> WordNet this way. However, offset is not the best "key", especially for >> indexing, because offsets change as WordNet evolves. SenseKeys (e.g. >> bank%1:14:00:: >> and bank%1:21:01::) should be more suitable for indexing. >> >> If you're looking to connect with YAGO above, you might do something along >> the lines of >> ....getWordBySenseKey(sensekey).getSynset().getOffset and then add >> 100000000 >> to get the YAGO ids. >> >> Aliaksandr >> >> >> On 9 September 2015 at 09:51, Cristian Petroaca < >> cristian.petro...@gmail.com >> > wrote: >> >> > I am looking for the Sense Id of the word. It has this format here : >> > >> > >> http://resources.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoWordnetIds.txt >> > >> > >> > On Tue, Sep 8, 2015 at 6:47 PM, Anthony Beylerian < >> > anthony.beyler...@gmail.com> wrote: >> > >> > > Hi, >> > > >> > > Thanks it is still being improved. >> > > >> > > I am not sure what you mean by type or database ID. >> > > Currently the sense source and the sense ID are returned. >> > > >> > > For example: >> > > >> > > "I went to the bank to deposit money." >> > > target : bank (index : 4) >> > > expected output : [WORDNET bank%1:14:00:: 21.6, WORDNET bank%1:21:01:: >> > > 5.8,... etc] >> > > >> > > Where "bank%1:14:00::" is a SenseKey which you can query WordNet with >> to >> > > give you a sense definition. >> > > >> > > You can do this using the default dictionary : >> > > >> > > >> > >> Dictionary.getDefaultResourceInstance().getWordBySenseKey(sensekey).getSynset().getGloss(); >> > > >> > > Hope this is what you are looking for, otherwise please clarify. >> > > >> > > Anthony Beylerian >> > > >> > > On Tue, Sep 8, 2015 at 5:34 PM, Cristian Petroaca < >> > > cristian.petro...@gmail.com> wrote: >> > > >> > > > Hi Anthony, >> > > > >> > > > I had a chance to test the wsd component. That's great work. Thanks. >> > > > One question, is it possible to return the wordnet type (or database >> > id) >> > > of >> > > > the disambiguated word? >> > > > >> > > > Thanks, >> > > > Cristian >> > > > >> > > > On Fri, Jul 24, 2015 at 1:14 PM, Anthony Beylerian < >> > > > anthonybeyler...@hotmail.com> wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > To try out the ongoing implementations, after checking out the >> > sandbox >> > > > > repository please try these steps : >> > > > > 1- Create a resource models directory: >> > > > > >> > > > > - src >> > > > > - test >> > > > > - resources >> > > > > + models >> > > > > >> > > > > 2- Include the following pre-trained models and dictionary in that >> > > > > directory: >> > > > > You can find those here [1] if you like or pre-train your own >> models. >> > > > > >> > > > > { >> > > > > en-token.bin, >> > > > > en-pos-maxent.bin, >> > > > > en-sent.bin,en-ner-person.bin,en-lemmatizer.dict >> > > > > } >> > > > > >> > > > > As to train the IMS approach you need to include training data >> like >> > > > > senseval3 [2]: >> > > > > For now, please add these folders : >> > > > > - src >> > > > > - test >> > > > > - resources >> > > > > - supervised >> > > > > + raw >> > > > > + models >> > > > > + dictionary >> > > > > >> > > > > You can find the data files here [2]. >> > > > > >> > > > > 3- We included two examples [LeskTester.java] and [IMSTester.java] >> > that >> > > > > you can run directly, or make your own tests. >> > > > > >> > > > > To run a custom test, minimally you need to have a tokenized text >> or >> > > > > sentence for example for Lesk: >> > > > > >> > > > > 1>> String[] words = >> > > Loader.getTokenizer().tokenize(sentence); >> > > > > >> > > > > Chose the index of the word to disambiguate in the token array. >> > > > > >> > > > > 2>> int wordIndex= 6; >> > > > > >> > > > > Then just create a WSDisambiguator object for example for Lesk : >> > > > > >> > > > > 3>> Lesk lesk = new Lesk(); >> > > > > >> > > > > And you can call the default disambiguation method >> > > > > >> > > > > 4>> lesk.disambiguate(words,wordIndex); >> > > > > >> > > > > You will get an array of strings with the following format : >> > > > > >> > > > > Lesk : [Source SenseKey Score] >> > > > > >> > > > > To read the sense definitions you can use the method : >> > > > > [opennlp.tools.disambiguator.Constants.printResults] >> > > > > >> > > > > For using the variations of Lesk, you will need to create and >> > > configure a >> > > > > parameters object: >> > > > > 5>> LeskParameters leskParams = new LeskParameters(); >> > > > > 6>> >> > > > > >> > > >> leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF); >> > > > > 7>> leskParams.setWin_b_size(4); 8>> >> > > > > leskParams.setDepth(3); 9>> lesk.setParams(leskParams); >> > > > > >> > > > > Typically, IMS should perform better than Lesk, since Lesk is a >> > classic >> > > > > method but it usually used as a baseline along with the most >> frequent >> > > > sense >> > > > > (MFS). >> > > > > However, we will be testing and adding more techniques. >> > > > > >> > > > > In any case, please feel free to ask for more details. >> > > > > >> > > > > Best, >> > > > > >> > > > > Anthony >> > > > > >> > > > > [1] : >> > > > > >> > > > >> > > >> > >> https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing >> > > > > [2] : >> > > > > >> > > > >> > > >> > >> https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing >> > > > > > Date: Fri, 24 Jul 2015 09:54:02 +0200 >> > > > > > Subject: Re: Word Sense Disambiguator >> > > > > > From: kottm...@gmail.com >> > > > > > To: dev@opennlp.apache.org >> > > > > > >> > > > > > It would be nice if you could share instructions on how to run >> it. >> > > > > > I also would like to give it a try. >> > > > > > >> > > > > > Jörn >> > > > > > >> > > > > > On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian < >> > > > > > anthonybeyler...@hotmail.com> wrote: >> > > > > > >> > > > > > > Hello, >> > > > > > > Yes for the moment we are only using WordNet for sense >> > > > definitions.The >> > > > > > > plan is to complete the package by mid to late August, but if >> you >> > > > like >> > > > > you >> > > > > > > can follow up on the progress from the sandbox. >> > > > > > > Best regards, >> > > > > > > Anthony >> > > > > > > > Date: Thu, 23 Jul 2015 15:36:57 +0300 >> > > > > > > > Subject: Word Sense Disambiguator >> > > > > > > > From: cristian.petro...@gmail.com >> > > > > > > > To: dev@opennlp.apache.org >> > > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > I saw that there are people actively working on a Word Sense >> > > > > > > Disambiguator. >> > > > > > > > DO you guys know when will the module be ready to use? Also >> I >> > > > assume >> > > > > that >> > > > > > > > wordnet is used to define the disambiguated word meaning? >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Cristian >> > > > > > > >> > > > > > > >> > > > > >> > > > > >> > > > >> > > >> > >> > >