Hello,

It has been few public activity these last days. We believe that it is
very important to step up in two directions wrt what is already commited in svn:

1. Finishing the WSDEvaluator
2. Provide the classes required to run the WSD tools from the CLI as
any other component.
3. Formats: it will be interesting to have at least conversor for the
most common dataset used for evaluation and training. E.g., semcor and
senseval-3. You have mentioned that a conversor was already
implemented but I cannot find it in svn.
4. Write the documentation so that future users (and other dev members
here) can test the component.

These comments were general for both unsupervised and supervised WSD.
Specific to supervised WSD:

5. IMS: you mention in your previous email that the lexical sample
part is done and that you need to finish the all words IMS
implementation. If this is the case, a JIRA issue should be open about
it and make it a priority.
Incidentally, I cannot find the IMSTester you mentioned in the email.

There is an issue already there for the Evaluator (OPENNLP-790) but I
think that each of the remaining tasks require their JIRA issues
(these issue has pending unused imports, variables and other things).

The aim before GSOC ends should be to have the best chance of having the
WSDcomponent as a good candidate for its integration in the opennlp
tools. Also, by being able to test it  we can see the actual state of
the component with respect to performance in the usual datasets.

Can you please create such issues in JIRA and start addressing them separately?

Thanks,

Rodrigo



On Sun, Jun 28, 2015 at 6:33 PM, Mondher Bouazizi
<[email protected]> wrote:
> Hi everyone,
>
> I finished the first iteration of IMS approach for lexical sample
> disambiguation. Please find the patch uploaded on the jira issue [1]. I
> also created a tester (IMSTester) to run it.
>
> As I mentioned before, the approach is as follows: each time, the module is
> called to disambiguate a word, it first check if the model file for that
> word exists.
>
> 1- If the "model" file exists, it is used to disambiguate the word
>
> 2- Otherwise, if the file does not exist, the module checks if the training
> data file for that word exists. If it does, the xml file data will be used
> to train the model and create the model file.
>
> 3- If no training data exist, the most frequent sense (mfs) in WordNet is
> returned.
>
> For now I am using the training data I collected from Senseval and Semeval
> websites. However, I am currently checking semcore to use it as a main
> reference.
>
> Yours sincerely,
>
> Mondher
>
> [1] https://issues.apache.org/jira/browse/OPENNLP-757
>
>
>
> On Thu, Jun 25, 2015 at 5:27 AM, Joern Kottmann <[email protected]> wrote:
>
>> On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote:
>> > Hi,
>> >
>> > Actually I have finished the implementation of most of the parts of the
>> IMS
>> > approach. I also made a parser for the Senseval-3 data.
>> >
>> > However I am currently working on two main points:
>> >
>> > - I am trying to figure out how to use the MaxEnt classifier.
>> Unfortunately
>> > there is no enough documentation, so I am trying to see how it is used by
>> > the other components of OpenNLP. Any recommendation ?
>>
>> Yes, have a look at the doccat component. It should be easy to
>> understand from it how it works. The classifier has to be trained with
>> an event (outcome and features) and can then classify a set of features
>> in the categories it has seen before as outcome.
>>
>> Jörn
>>

Reply via email to