Dear Rodrigo, Here is what I am planning to do in the next step:
1- I am currently implementing the IMS method, and using Senseval 3 data. Since the disambiguation training set, has to be very big (few hundreds of MBs if we want it to contain all the words),I thought, may be it would be better to load the model related to the word to disambiguate. Therefore, I made the following: - A folder containing the ".xml" files where the data related to each word are stored. - A folder containing the ".bin" files (one for each word) The idea is that each time, the module is called to disambiguate a word, we first check if the ".bin" file exists. If it is there, we use the file to disambiguate the word. Otherwise (i.e., the bin file is not there), we check if the ".xml" file exist. If it does, the xml file data will be used to train the model, and create the ".bin" file (That way, the next time the user wants to disambiguate the same word, the ".bin" file is already there). Is that OK ? 2- For the implementation, I should will refer to some external sources (e.g., WordNet) to get the sense definition, because the classifier will return only the ID of the sense. I have the choice either to query WordNet for the senses (Note: in the case of Senseval 3 data, the senses are already extracted and put in a separate file, however, when we generalize the approach, probably the senses won't be given) or collect all the senses once, store them, and refer to them to get the sense. I am planning to implement the first approach, however, since I am experimenting on the Senseval 3 data set, I will first use the resources I have there. Thank you in advance. Yours sincerely, Mondher
