Hello,

one of the tasks we should start is, is to define the interface for the WSD
component.

Please have a look at the other components in OpenNLP and try to propose an
interface in a similar style.
Can we use one interface for all the different implementations?

Jörn


On Mon, May 18, 2015 at 3:27 PM, Mondher Bouazizi <
mondher.bouaz...@gmail.com> wrote:

> Dear all,
>
> Sorry if you received multiple copies of this email (The links were
> embedded). Here are the actual links:
>
> *Figure:*
>
> https://drive.google.com/file/d/0B7ON7bq1zRm3Sm1YYktJTVctLWs/view?usp=sharing
> *Semeval/senseval results summary:*
>
> https://docs.google.com/spreadsheets/d/1NCiwXBQs0rxUwtZ3tiwx9FZ4WELIfNCkMKp8rlnKObY/edit?usp=sharing
> *Literature survey of WSD techniques:*
>
> https://docs.google.com/spreadsheets/d/1WQbJNeaKjoT48iS_7oR8ifZlrd4CfhU1Tay_LLPtlCM/edit?usp=sharing
>
> Yours faithfully
>
> On Mon, May 18, 2015 at 10:17 PM, Anthony Beylerian <
> anthonybeyler...@hotmail.com> wrote:
>
> > Please excuse the duplicate email, we could not attach the mentioned
> > figure.
> > Kindly find it here.
> > Thank you.
> >
> > From: anthonybeyler...@hotmail.com
> > To: dev@opennlp.apache.org
> > Subject: GSoC 2015 - WSD Module
> > Date: Mon, 18 May 2015 22:14:43 +0900
> >
> >
> >
> >
> > Dear all,
> > In the context of building a Word Sense Disambiguation (WSD) module,
> after
> > doing a survey on WSD techniques, we realized the following points :
> > - WSD techniques can be split into three sets (supervised,
> > unsupervised/knowledge based, hybrid) - WSD is used for different
> directly
> > related objectives such as all-words disambiguation, lexical sample
> > disambiguation, multi/cross-lingual approaches etc.- Senseval/Semeval
> seem
> > to be good references to compare different techniques for WSD since many
> of
> > them were tested on the same data (but different one each event).- For
> the
> > sake of making a first solution, we propose to start with supporting the
> > "lexical sample" type of disambiguation, meaning to disambiguate
> > single/limited word(s) from an input text.
> > Therefore, we have decided to collect information about the different
> > techniques in the literature (such as  references, performance,
> parameters
> > etc.) in this spreadsheet here.Otherwise we have also collected the
> results
> > of all the senseval/semeval exercises here.(Note that each document has
> > many sheets)The collected results, could help decide on which techniques
> to
> > start with as main models for each set of techniques
> > (supervised/unsupervised).
> > We also propose a general approach for the package in the figure
> > attached.The main components are as follows :
> > 1- The different resources publicly available : WordNet, BabelNet,
> > Wikipedia, etc.However, we would also like to allow the users to use
> their
> > own local resources, by maybe defining a type of connector to the
> resource
> > interface.
> > 2- The resource interface will have the role to provide both a sense
> > inventory that the user can query and a knowledge base (such as semantic
> or
> > syntactic info. etc.) that might be used depending on the technique.We
> > might even later consider building a local cache for remote services.
> > 3- The WSD algorithms/techniques themselves that will make use of the
> > resource interface to access the resources required.These techniques will
> > be split into two main packages as in the left side of the figure :
> > Supervised/Unsupervised.The utils package includes common tools used in
> > both types of techniques.The details mentioned in each package should be
> > common to all implementations of these abstract models.
> > 4- I/O could be processed in different formats (XML/JSON etc) or a
> simpler
> > structure following your recommendations.
> > If you have any suggestions or recommendations, we would really
> appreciate
> > discussing them and would like your guidance to iterate on this tool-set.
> > Best regards,
> >
> > Anthony Beylerian, Mondher Bouazizi
> >
>

Reply via email to