fixed items for Similarity / RE: minutes of the skype call on Similarity component

Boris Galitsky Thu, 05 Apr 2012 17:25:37 -0700



Hi guys

I want to indicate which items
indicated in my previous status email are fixed now:

1) Prepare Similarity fro release from sandbox:

      a) improve readme.txt, add 'The entry point to
Similarity component is

SentencePairMatchResult matchRes =
sm.assessRelevance(sentence1,sentence2);

where matchRes includes the similarity score (weighted number  of common terms) 
and the set of maximum
common parse trees.
>>> Done
      b) improve caching. Now it is implemented via java object serialization; 
make it via CSV files>>> Done      c) proper location for cache files and 
resources:      joernkottmann: src/test/resources
>>> Done      d) verify porter stemmer (remove lucene dependecies, remove 
>>> porter stemmer from /similarity>>> That will be done outside of Simlarity. 
>>> Right now downloadable opennlp-tools 1.5.2      do not have Porter 
>>> sytemmer. so I temporarily have it within Similarity
e)re-format code, use eclipse template for re-format          joernkottmann: 
http://opennlp.apache.org/code-conventions.html
>>> Done      f) package into separate jar/ src using Maven
 2) Next major feature of Similarity: taxonomy auto learning and using taxonomy 
to improve search relevance      a)  see how Similarity component can help with 
search tasks>>> Done. .     3) More examples and docs for similarity component  
    a) examples for finding similar news at allvoices.com>>> Started, but not 
easy to integrate into Similarity because tightly connected with the original 
project
                email the code which generates search query for news articles   
   b)email the link to the papers on       joernkottmann: 
https://cwiki.apache.org/OPENNLP/nlp-papers.html>>> I extended the list with 
new section on the papers on similarity'
  4) Other future features/improvements for Similarity<<< These are FUTURE items
      a) how can we create a more accurate Parse object running chunker 
separately and then applying alignment algorithm      b) Coreference component  
       joernkottmann: TreebankNameFinder      c) apply machine learning to 
parse trees + coreferences. " parse forest": is it a   good name?        
joernkottmann: CorefSample.
RegardsBoris
fixed items for Similarity / RE: minutes of the skype call on Similarity component

Reply via email to