[GitHub] [commons-text] kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity algoritham

GitBox Fri, 08 Mar 2019 20:24:48 -0800

kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity 
algoritham
URL: https://github.com/apache/commons-text/pull/103#issuecomment-471144174
 
 
   @ameyjadiye see last comment from @aherbert about empty strings and `0` vs. 
`1`.
   
   @aherbert while we are discussing #109 , do you think that is a blocker for 
this pull request? So far I think at least the API proposed here would be kept 
right?
   
   If so, this could be merged once the last comment is resolved, and then we 
can discuss how to organize the classes and where the sorensen-dice coefficient 
is calculated.
   
   I think the only thing missing is deciding on the name of the classes? 
Whether it should use `Bigram` in the name or be just `SorensenDiceSimilarity`.
   
   I like the idea of having a descriptive name such as 
`BigramSorensenDiceSimilarity` (or `Bigram` in other place/order). However, I 
think we should also considerate what users would expect. i.e. in other 
libraries, does the Sorensen Dice similarity used is for bigrams always? If 
other implementations Python/JS/Java in used bigrams, then we could leave it as 
`SorensenDiceSimilarity` and either add another method/constructor/etc to 
customize the similarity, or then have another class...
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [commons-text] kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity algoritham

Reply via email to