Hello Mondher (my response is about supervised WSD),

Thanks for the info, it is quite interesting. Apart from the comment
by Jörn, which I think is very important if we want to achieve
something given the time constrains of the GSOC, I have a couple of
recommendations/comments from my part:

1. Rather than targeting Lexical Sample task or all words WSD I think
it could be more operative to choose an approach/algorithm and try to
implement it in OpenNLP. One of the most (it not the most) popular
approaches is the "it Makes Sense" (IMS) system

http://www.comp.nus.edu.sg/~nlp/sw/README.txt
https://www.comp.nus.edu.sg/~nght/pubs/ims.pdf

That I think is achievable in the GSOC time frame.

2. As an aside, research has been moving towards supersense tagging
(SST), given the dificulty of WSD.

http://ttic.uchicago.edu/~altun/pubs/CiaAlt_EMNLP06.pdf

As you can see in the above paper, SST is approached as a sequence
labelling task, rather than classification. This means that we could
reimplement Ciaramita and Altun (2006) features implementing the
AdaptiveFeatureGenerators and creating a module structurally similar
to the NameFinder but for SST.

This has also the advantage of being able to move to datasets that are
not old Semcor and senseval and using current Tweet datasets and so
on. See this recent paper on SST on tweets:

http://aclweb.org/anthology/S14-1001

I think that for supervised WSD, we should pursue option 1. or 2. and
start definining the interface as Jörn has suggested.

Best,

Rodrigo

On Mon, May 18, 2015 at 2:14 PM, Anthony Beylerian
<anthonybeyler...@hotmail.com> wrote:
> Dear all,
>
> In the context of building a Word Sense Disambiguation (WSD) module, after
> doing a survey on WSD techniques, we realized the following points :
>
> - WSD techniques can be split into three sets (supervised,
> unsupervised/knowledge based, hybrid)
>
> - WSD is used for different directly related objectives such as all-words
> disambiguation, lexical sample disambiguation, multi/cross-lingual
> approaches etc.
>
> - Senseval/Semeval seem to be good references to compare different
> techniques for WSD since many of them were tested on the same data (but
> different one each event).
>
> - For the sake of making a first solution, we propose to start with
> supporting the "lexical sample" type of disambiguation, meaning to
> disambiguate single/limited word(s) from an input text.
>
>
> Therefore, we have decided to collect information about the different
> techniques in the literature (such as  references, performance, parameters
> etc.) in this spreadsheet here.
> Otherwise we have also collected the results of all the senseval/semeval
> exercises here.
> (Note that each document has many sheets)
> The collected results, could help decide on which techniques to start with
> as main models for each set of techniques (supervised/unsupervised).
>
> We also propose a general approach for the package in the figure attached.
> The main components are as follows :
>
> 1- The different resources publicly available : WordNet, BabelNet,
> Wikipedia, etc.
> However, we would also like to allow the users to use their own local
> resources, by maybe defining a type of connector to the resource interface.
>
> 2- The resource interface will have the role to provide both a sense
> inventory that the user can query and a knowledge base (such as semantic or
> syntactic info. etc.) that might be used depending on the technique.
> We might even later consider building a local cache for remote services.
>
> 3- The WSD algorithms/techniques themselves that will make use of the
> resource interface to access the resources required.
> These techniques will be split into two main packages as in the left side of
> the figure :  Supervised/Unsupervised.
> The utils package includes common tools used in both types of techniques.
> The details mentioned in each package should be common to all
> implementations of these abstract models.
>
> 4- I/O could be processed in different formats (XML/JSON etc) or a simpler
> structure following your recommendations.
>
> If you have any suggestions or recommendations, we would really appreciate
> discussing them and would like your guidance to iterate on this tool-set.
>
> Best regards,
>
> Anthony Beylerian, Mondher Bouazizi

Reply via email to