Hi Mondher, On Fri, Jun 12, 2015 at 1:01 PM, Mondher Bouazizi <[email protected]> wrote: > Dear Rodrigo, > > Here is what I am planning to do in the next step: > > 1- I am currently implementing the IMS method, and using Senseval 3 data.
Hi, I guess you are training on semcor? http://web.eecs.umich.edu/~mihalcea/downloads.html#semcor > Since the disambiguation training set, has to be very big (few hundreds of > MBs if we want it to contain all the words),I thought, may be it would be > better to load the model related to the word to disambiguate. Therefore, I > made the following: > > - A folder containing the ".xml" files where the data related to each > word are stored. > - A folder containing the ".bin" files (one for each word) > > The idea is that each time, the module is called to disambiguate a word, we > first check if the ".bin" file exists. If it is there, we use the file to > disambiguate the word. Otherwise (i.e., the bin file is not there), we > check if the ".xml" file exist. If it does, the xml file data will be used > to train the model, and create the ".bin" file (That way, the next time > the user wants to disambiguate the same word, the ".bin" file is already > there). > > Is that OK ? OK. > > 2- For the implementation, I should will refer to some external sources > (e.g., WordNet) to get the sense definition, because the classifier will > return only the ID of the sense. I have the choice either to query WordNet > for the senses (Note: in the case of Senseval 3 data, the senses are > already extracted and put in a separate file, however, when we generalize > the approach, probably the senses won't be given) or collect all the senses > once, store them, and refer to them to get the sense. I am planning to > implement the first approach, however, since I am experimenting on the > Senseval 3 data set, I will first use the resources I have there. Good. You can do that with extJWN, right? Cheers, Rodrigo
