Dear Rodrigo,

Here is what I am planning to do in the next step:

1- I am currently implementing the IMS method, and using Senseval 3 data.
Since the disambiguation training set, has to be very big (few hundreds of
MBs if we want it to contain all the words),I thought, may be it would be
better to load the model related to the word to disambiguate. Therefore, I
made the following:

   - A folder containing the ".xml" files where the data related to each
   word are stored.
   - A folder containing the ".bin" files (one for each word)

The idea is that each time, the module is called to disambiguate a word, we
first check if the ".bin" file exists. If it is there, we use the file to
disambiguate the word. Otherwise (i.e., the bin file is not there), we
check if the ".xml" file exist. If it does, the xml file data will be used
to train the model, and create the ".bin" file  (That way, the next time
the user wants to disambiguate the same word, the ".bin" file is already
there).

Is that OK ?

2- For the implementation, I should will refer to some external sources
(e.g., WordNet) to get the sense definition, because the classifier will
return only the ID of the sense. I have the choice either to query WordNet
for the senses (Note: in the case of Senseval 3 data, the senses are
already extracted and put in a separate file, however, when we generalize
the approach, probably the senses won't be given) or collect all the senses
once, store them, and refer to them to get the sense. I am planning to
implement the first approach, however, since I am experimenting on the
Senseval 3 data set, I will first use the resources I have there.

Thank you in advance.

Yours sincerely,

Mondher

Reply via email to