Hi guys
 per Aliaksandr's suggestion, below are the minutes of our conversation with 
Jorn about Similarity component and other related issues
1) Prepare Similarity fro release from sandbox:

      a) improve readme.txt, add 'The entry point to
Similarity component is

SentencePairMatchResult matchRes =
sm.assessRelevance(sentence1,sentence2);

where matchRes includes the similarity score (weighted number  of common terms) 
and the set of maximum
common parse trees.
      b) improve cacheing. Now it is implemented via java object serialization; 
make it via CSV files
      c) proper location for cache files and resources:      joernkottmann: 
src/test/resources      d) verify porter stemmer (remove lucene dependecies, 
remove porter stemmer from /similarity      e)re-format code, use eclipse 
template for re-format          joernkottmann: 
http://opennlp.apache.org/code-conventions.html      f) package into separate 
jar/ src using Maven
 2) Next major feature of Similarity: taxonomy auto learning and using taxonomy 
to improve search relevance      a)  see how Similarity component can help with 
search tasks'      b) integration with SOLR (compare/complement 
github.com/tamingtext of Grant Ingersoll with Similarity). there are some  JIRA 
issue opened for hooking in some of tamingtext  stuff to the analyzers modules 
in Solr     3) More examples and docs for similarity component      a) examples 
for finding similar news at allvoices.com                email the code which 
generates search query for news articles      b)email the link to the papers on 
      joernkottmann: https://cwiki.apache.org/OPENNLP/nlp-papers.html
  4) Other future features/improvements for Similarity      a) how can we 
create a more accurate Parse object running chunker separately and then 
applying alignment algorithm      b) Coreference component         
joernkottmann: TreebankNameFinder      c) apply machine learning to parse trees 
+ coreferences. " parse forest": is it a   good name?        joernkottmann: 
CorefSample.
RegardsBoris

                                          

Reply via email to