RE: WSD - Supervised techniques

Anthony Beylerian Mon, 13 Jul 2015 10:28:02 -0700

Dear Rodrigo, 

Thank you for the feedback.


I have added [1][2][3] issues regarding the below.

Concerning the testers (IMSTester etc) they should be in src/test/java/....
We can add docs in those to explain how to use each implementation.

Actually, I am using the parser for Senseval3 that Mondher mentionedin 
[LeskEvaluatorTest], the functionality was included in DataExtractor.
I believe it would be best to separate that and have two parser/converter 
classes of the sort :

disambiguator.reader.SemCorReader,
disambiguator.reader.SensevalReader.

That should be clearer, what do you think ?

Anthony

[1]: https://issues.apache.org/jira/browse/OPENNLP-794
[2]: https://issues.apache.org/jira/browse/OPENNLP-795[3]: 
https://issues.apache.org/jira/browse/OPENNLP-796

> From: [email protected]
> Date: Mon, 13 Jul 2015 15:50:00 +0200
> Subject: Re: WSD - Supervised techniques
> To: [email protected]
> 
> Hello,
> 
> It has been few public activity these last days. We believe that it is
> very important to step up in two directions wrt what is already commited in 
> svn:
> 
> 1. Finishing the WSDEvaluator
> 2. Provide the classes required to run the WSD tools from the CLI as
> any other component.
> 3. Formats: it will be interesting to have at least conversor for the
> most common dataset used for evaluation and training. E.g., semcor and
> senseval-3. You have mentioned that a conversor was already
> implemented but I cannot find it in svn.
> 4. Write the documentation so that future users (and other dev members
> here) can test the component.
> 
> These comments were general for both unsupervised and supervised WSD.
> Specific to supervised WSD:
> 
> 5. IMS: you mention in your previous email that the lexical sample
> part is done and that you need to finish the all words IMS
> implementation. If this is the case, a JIRA issue should be open about
> it and make it a priority.
> Incidentally, I cannot find the IMSTester you mentioned in the email.
> 
> There is an issue already there for the Evaluator (OPENNLP-790) but I
> think that each of the remaining tasks require their JIRA issues
> (these issue has pending unused imports, variables and other things).
> 
> The aim before GSOC ends should be to have the best chance of having the
> WSDcomponent as a good candidate for its integration in the opennlp
> tools. Also, by being able to test it  we can see the actual state of
> the component with respect to performance in the usual datasets.
> 
> Can you please create such issues in JIRA and start addressing them 
> separately?
> 
> Thanks,
> 
> Rodrigo
> 
> 
> 
> On Sun, Jun 28, 2015 at 6:33 PM, Mondher Bouazizi
> <[email protected]> wrote:
> > Hi everyone,
> >
> > I finished the first iteration of IMS approach for lexical sample
> > disambiguation. Please find the patch uploaded on the jira issue [1]. I
> > also created a tester (IMSTester) to run it.
> >
> > As I mentioned before, the approach is as follows: each time, the module is
> > called to disambiguate a word, it first check if the model file for that
> > word exists.
> >
> > 1- If the "model" file exists, it is used to disambiguate the word
> >
> > 2- Otherwise, if the file does not exist, the module checks if the training
> > data file for that word exists. If it does, the xml file data will be used
> > to train the model and create the model file.
> >
> > 3- If no training data exist, the most frequent sense (mfs) in WordNet is
> > returned.
> >
> > For now I am using the training data I collected from Senseval and Semeval
> > websites. However, I am currently checking semcore to use it as a main
> > reference.
> >
> > Yours sincerely,
> >
> > Mondher
> >
> > [1] https://issues.apache.org/jira/browse/OPENNLP-757
> >
> >
> >
> > On Thu, Jun 25, 2015 at 5:27 AM, Joern Kottmann <[email protected]> wrote:
> >
> >> On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote:
> >> > Hi,
> >> >
> >> > Actually I have finished the implementation of most of the parts of the
> >> IMS
> >> > approach. I also made a parser for the Senseval-3 data.
> >> >
> >> > However I am currently working on two main points:
> >> >
> >> > - I am trying to figure out how to use the MaxEnt classifier.
> >> Unfortunately
> >> > there is no enough documentation, so I am trying to see how it is used by
> >> > the other components of OpenNLP. Any recommendation ?
> >>
> >> Yes, have a look at the doccat component. It should be easy to
> >> understand from it how it works. The classifier has to be trained with
> >> an event (outcome and features) and can then classify a set of features
> >> in the categories it has seen before as outcome.
> >>
> >> Jörn
> >>

RE: WSD - Supervised techniques

Reply via email to