[
https://issues.apache.org/jira/browse/OPENNLP-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600872#comment-14600872
]
Joern Kottmann commented on OPENNLP-758:
----------------------------------------
- Add a simple pom.xml to get the build working
- Add more javdoc to the central WSDisambiguator interface
- Add at least a one sentence javdoc description to each class explaining what
it is
- The preprocessing should not be done by the wsd component, please leave that
to the user as much as it makes sense,
usualy the input is first tokenized and split into sentences, the user can
also run a pos tagger over the sentences
and then pass in all that data to the wsd component
Do you both have an ICLA on file?
> Unsupervised WSD techniques
> ---------------------------
>
> Key: OPENNLP-758
> URL: https://issues.apache.org/jira/browse/OPENNLP-758
> Project: OpenNLP
> Issue Type: New Feature
> Components: POS Tagger, Sentence Detector, Stemmer
> Reporter: Mondher Bouazizi
> Labels: gsoc, gsoc2015, java, nlp, wsd
> Attachments: lesk_parameters.patch, opennlp-tools-disambiguator.patch
>
>
> The objective of Word Sense Disambiguation (WSD) is to determine which sense
> of a word is meant in a particular context. Therefore, WSD is a
> classification task, where the classes are the different senses of the
> ambiguous word.
> Different techniques are proposed in the academic literature, which fall
> mainly into two categories: Supervised and Unsupervised.
> For this component, we focus on unsupervised techniques: these methods are
> based on unlabeled data, and do not exploit any manually tagged data.
> The object of this project is to create a WSD solution (for English) that
> implements some unsupervised techniques. For example:
> - Context Clustering
> - Word Clustering
> - Cooccurrence Graphs
> - Overlap of Sense Definitions
> - Selectional Preferences
> - Structural Approaches
> - Etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)