Hi, Actually I have finished the implementation of most of the parts of the IMS approach. I also made a parser for the Senseval-3 data.
However I am currently working on two main points: - I am trying to figure out how to use the MaxEnt classifier. Unfortunately there is no enough documentation, so I am trying to see how it is used by the other components of OpenNLP. Any recommendation ? - I am training on semcor and I will use it as soon as I finish the implementation of the "train", "load" and "disambiguate" methods in the IMS approach. Regarding extJWNL, I actually worked with JWNL in a previous work. I checked extJWNL, and what I would query WordNet for won't be different. I will upload a patch as soon as I implement the aforementioned methods. Best regards, Mondher On Fri, Jun 19, 2015 at 5:09 PM, Rodrigo Agerri <[email protected]> wrote: > Hi Mondher, > > On Fri, Jun 12, 2015 at 1:01 PM, Mondher Bouazizi > <[email protected]> wrote: > > Dear Rodrigo, > > > > Here is what I am planning to do in the next step: > > > > 1- I am currently implementing the IMS method, and using Senseval 3 data. > > Hi, I guess you are training on semcor? > > http://web.eecs.umich.edu/~mihalcea/downloads.html#semcor > > > > Since the disambiguation training set, has to be very big (few hundreds > of > > MBs if we want it to contain all the words),I thought, may be it would be > > better to load the model related to the word to disambiguate. Therefore, > I > > made the following: > > > > - A folder containing the ".xml" files where the data related to each > > word are stored. > > - A folder containing the ".bin" files (one for each word) > > > > The idea is that each time, the module is called to disambiguate a word, > we > > first check if the ".bin" file exists. If it is there, we use the file to > > disambiguate the word. Otherwise (i.e., the bin file is not there), we > > check if the ".xml" file exist. If it does, the xml file data will be > used > > to train the model, and create the ".bin" file (That way, the next time > > the user wants to disambiguate the same word, the ".bin" file is already > > there). > > > > Is that OK ? > > OK. > > > > > 2- For the implementation, I should will refer to some external sources > > (e.g., WordNet) to get the sense definition, because the classifier will > > return only the ID of the sense. I have the choice either to query > WordNet > > for the senses (Note: in the case of Senseval 3 data, the senses are > > already extracted and put in a separate file, however, when we generalize > > the approach, probably the senses won't be given) or collect all the > senses > > once, store them, and refer to them to get the sense. I am planning to > > implement the first approach, however, since I am experimenting on the > > Senseval 3 data set, I will first use the resources I have there. > > Good. You can do that with extJWN, right? > > Cheers, > > Rodrigo >
