Re: GSoC 2015 - WSD Module

Mondher Bouazizi Mon, 08 Jun 2015 06:50:52 -0700

Dear Rodrigo,

As Anthony mentioned in his previous email, I already started the
implementation of the IMS approach. The pre-processing and the extraction
of features have already been finished. Regarding the approach itself, it
shows some potential according to the author though the features proposed
are not so many, and are basic. I think the approach itself might be
enhanced if we add more context specific features from some other
approaches... (To do that, I need to run many experiments using different
combinations of features, however, that should not be a problem).
But the approach itself requires a linear SVM classifier, and as far as I
know, OpenNLP has only a Maximum Entropy classifier. Is it OK to use libsvm
?

Regarding the training data, I started collecting some from different
sources. Most of the existing rich corpora are licensed (Including the ones
mentioned in the paper). The free ones I got for now are from the Senseval
and Semeval websites. However, these are used just to evaluate the proposed
methods in the workshops. Therefore, the words to disambiguate are few in
number though the training data for each word are rich enough.

In any case, the first tests with Senseval and Semeval collected should be
finished soon. However, I am not sure if there is a rich enough Dataset we
can use to make our model for the WSD module in the OpenNLP library.
If you have any recommendation, I would be grateful if you can help me on
this point.

On the other hand, we're cleaning our implementation of the different
variations of Lesk. However, we are currently using JWNL. If there are no
objections, we will migrate to extJWNL.

As Jörn mentioned sending an initial patch, should we separate our codes
and upload two different patches to the two issues we created on the Jira
(however, this means a lot of redundancy in the code), or shall we keep
them in one project and upload it? If we opt for the latter case, which
issue should we upload the patch to ?

Thanks,

Mondher, Anthony

On Mon, Jun 8, 2015 at 7:51 PM, Rodrigo Agerri <rage...@apache.org> wrote:

> Hello,
>
> +1 for using extJWNL instead of JWNL, I use it in some other projects
> too and it is very nice IMHO.
>
> R
>
> On Sat, Jun 6, 2015 at 12:55 PM, Aliaksandr Autayeu
> <aliaksa...@autayeu.com> wrote:
> > Thinking of impartiality... Anyway, I'm the author of extJWNL in case you
> > have questions.
> >
> > Aliaksandr
> >
> > On 6 June 2015 at 11:43, Richard Eckart de Castilho <
> > richard.eck...@gmail.com> wrote:
> >
> >> On 05.06.2015, at 14:24, Anthony Beylerian <
> anthonybeyler...@hotmail.com>
> >> wrote:
> >>
> >> > So just to make sure, we are currently relying on JWNL to access
> WordNet
> >> as a resource.
> >>
> >> There is a more modern fork of JWNL available called
> >> http://extjwnl.sourceforge.net .
> >> It includes provisions of loading WordNet from the classpath, e.g.
> >> from Maven dependencies. It might be a nice replacement for JWNL and is
> >> also licensed
> >> under the BSD license. Pre-packaged WordNet Maven artifacts are also
> >> available.
> >>
> >> Cheers,
> >>
> >> -- Richard
>

Re: GSoC 2015 - WSD Module

Reply via email to