Dear Rodrigo, Thank you for the feedback.
I have added [1][2][3] issues regarding the below. Concerning the testers (IMSTester etc) they should be in src/test/java/.... We can add docs in those to explain how to use each implementation. Actually, I am using the parser for Senseval3 that Mondher mentionedin [LeskEvaluatorTest], the functionality was included in DataExtractor. I believe it would be best to separate that and have two parser/converter classes of the sort : disambiguator.reader.SemCorReader, disambiguator.reader.SensevalReader. That should be clearer, what do you think ? Anthony [1]: https://issues.apache.org/jira/browse/OPENNLP-794 [2]: https://issues.apache.org/jira/browse/OPENNLP-795[3]: https://issues.apache.org/jira/browse/OPENNLP-796 > From: [email protected] > Date: Mon, 13 Jul 2015 15:50:00 +0200 > Subject: Re: WSD - Supervised techniques > To: [email protected] > > Hello, > > It has been few public activity these last days. We believe that it is > very important to step up in two directions wrt what is already commited in > svn: > > 1. Finishing the WSDEvaluator > 2. Provide the classes required to run the WSD tools from the CLI as > any other component. > 3. Formats: it will be interesting to have at least conversor for the > most common dataset used for evaluation and training. E.g., semcor and > senseval-3. You have mentioned that a conversor was already > implemented but I cannot find it in svn. > 4. Write the documentation so that future users (and other dev members > here) can test the component. > > These comments were general for both unsupervised and supervised WSD. > Specific to supervised WSD: > > 5. IMS: you mention in your previous email that the lexical sample > part is done and that you need to finish the all words IMS > implementation. If this is the case, a JIRA issue should be open about > it and make it a priority. > Incidentally, I cannot find the IMSTester you mentioned in the email. > > There is an issue already there for the Evaluator (OPENNLP-790) but I > think that each of the remaining tasks require their JIRA issues > (these issue has pending unused imports, variables and other things). > > The aim before GSOC ends should be to have the best chance of having the > WSDcomponent as a good candidate for its integration in the opennlp > tools. Also, by being able to test it we can see the actual state of > the component with respect to performance in the usual datasets. > > Can you please create such issues in JIRA and start addressing them > separately? > > Thanks, > > Rodrigo > > > > On Sun, Jun 28, 2015 at 6:33 PM, Mondher Bouazizi > <[email protected]> wrote: > > Hi everyone, > > > > I finished the first iteration of IMS approach for lexical sample > > disambiguation. Please find the patch uploaded on the jira issue [1]. I > > also created a tester (IMSTester) to run it. > > > > As I mentioned before, the approach is as follows: each time, the module is > > called to disambiguate a word, it first check if the model file for that > > word exists. > > > > 1- If the "model" file exists, it is used to disambiguate the word > > > > 2- Otherwise, if the file does not exist, the module checks if the training > > data file for that word exists. If it does, the xml file data will be used > > to train the model and create the model file. > > > > 3- If no training data exist, the most frequent sense (mfs) in WordNet is > > returned. > > > > For now I am using the training data I collected from Senseval and Semeval > > websites. However, I am currently checking semcore to use it as a main > > reference. > > > > Yours sincerely, > > > > Mondher > > > > [1] https://issues.apache.org/jira/browse/OPENNLP-757 > > > > > > > > On Thu, Jun 25, 2015 at 5:27 AM, Joern Kottmann <[email protected]> wrote: > > > >> On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote: > >> > Hi, > >> > > >> > Actually I have finished the implementation of most of the parts of the > >> IMS > >> > approach. I also made a parser for the Senseval-3 data. > >> > > >> > However I am currently working on two main points: > >> > > >> > - I am trying to figure out how to use the MaxEnt classifier. > >> Unfortunately > >> > there is no enough documentation, so I am trying to see how it is used by > >> > the other components of OpenNLP. Any recommendation ? > >> > >> Yes, have a look at the doccat component. It should be easy to > >> understand from it how it works. The classifier has to be trained with > >> an event (outcome and features) and can then classify a set of features > >> in the categories it has seen before as outcome. > >> > >> Jörn > >>
