Hi everyone,

I finished the first iteration of IMS approach for lexical sample
disambiguation. Please find the patch uploaded on the jira issue [1]. I
also created a tester (IMSTester) to run it.

As I mentioned before, the approach is as follows: each time, the module is
called to disambiguate a word, it first check if the model file for that
word exists.

1- If the "model" file exists, it is used to disambiguate the word

2- Otherwise, if the file does not exist, the module checks if the training
data file for that word exists. If it does, the xml file data will be used
to train the model and create the model file.

3- If no training data exist, the most frequent sense (mfs) in WordNet is
returned.

For now I am using the training data I collected from Senseval and Semeval
websites. However, I am currently checking semcore to use it as a main
reference.

Yours sincerely,

Mondher

[1] https://issues.apache.org/jira/browse/OPENNLP-757



On Thu, Jun 25, 2015 at 5:27 AM, Joern Kottmann <[email protected]> wrote:

> On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote:
> > Hi,
> >
> > Actually I have finished the implementation of most of the parts of the
> IMS
> > approach. I also made a parser for the Senseval-3 data.
> >
> > However I am currently working on two main points:
> >
> > - I am trying to figure out how to use the MaxEnt classifier.
> Unfortunately
> > there is no enough documentation, so I am trying to see how it is used by
> > the other components of OpenNLP. Any recommendation ?
>
> Yes, have a look at the doccat component. It should be easy to
> understand from it how it works. The classifier has to be trained with
> an event (outcome and features) and can then classify a set of features
> in the categories it has seen before as outcome.
>
> Jörn
>

Reply via email to