kinow commented on issue #103: TEXT-126: Adding Sorensen-Dice similarity algoritham URL: https://github.com/apache/commons-text/pull/103#issuecomment-471144174 @ameyjadiye see last comment from @aherbert about empty strings and `0` vs. `1`. @aherbert while we are discussing #109 , do you think that is a blocker for this pull request? So far I think at least the API proposed here would be kept right? If so, this could be merged once the last comment is resolved, and then we can discuss how to organize the classes and where the sorensen-dice coefficient is calculated. I think the only thing missing is deciding on the name of the classes? Whether it should use `Bigram` in the name or be just `SorensenDiceSimilarity`. I like the idea of having a descriptive name such as `BigramSorensenDiceSimilarity` (or `Bigram` in other place/order). However, I think we should also considerate what users would expect. i.e. in other libraries, does the Sorensen Dice similarity used is for bigrams always? If other implementations Python/JS/Java in used bigrams, then we could leave it as `SorensenDiceSimilarity` and either add another method/constructor/etc to customize the similarity, or then have another class... What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
