[ 
https://issues.apache.org/jira/browse/TEXT-158?focusedWorklogId=464809&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464809
 ]

ASF GitHub Bot logged work on TEXT-158:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Jul/20 00:23
            Start Date: 31/Jul/20 00:23
    Worklog Time Spent: 10m 
      Work Description: kinow merged pull request #142:
URL: https://github.com/apache/commons-text/pull/142


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 464809)
    Time Spent: 0.5h  (was: 20m)

> Incorrect values for Jaccard similarity with empty strings
> ----------------------------------------------------------
>
>                 Key: TEXT-158
>                 URL: https://issues.apache.org/jira/browse/TEXT-158
>             Project: Commons Text
>          Issue Type: Bug
>    Affects Versions: 1.6, 1.9
>            Reporter: Bruno P. Kinoshita
>            Assignee: Bruno P. Kinoshita
>            Priority: Minor
>             Fix For: 1.9.1
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In a discussion part of TEXT-126, it was 
> [pointed|https://github.com/apache/commons-text/pull/103#discussion_r263988298]
>  that the Jaccard similarity returns 0.0, and the distance 1.0. While in 
> other libraries it returns the opposite for each.
> {code:java}
> package br.eti.kinoshita.tests.text;
> import java.util.Collections;
> public class EditDistances {
>     public static void main(String[] args) {
>         System.out.println("Testing jaccard sim/dis with empty strings");
>         System.out.println("---");
>         org.simmetrics.metrics.Jaccard<String> j1 = new 
> org.simmetrics.metrics.Jaccard<>();
>         float s1 = j1.compare(Collections.emptySet(), Collections.emptySet());
>         System.out.println("Simmetrics Jaccard similarity: " + s1);
>         float d1 = j1.distance(Collections.emptySet(), 
> Collections.emptySet());
>         System.out.println("Simmetrics Jaccard distance: " + d1);
>         
>         System.out.println("---");
>         
>         info.debatty.java.stringsimilarity.Jaccard j2 = new 
> info.debatty.java.stringsimilarity.Jaccard();
>         double s2 = j2.similarity("", "");
>         System.out.println("javastringsimilarity Jaccard similarity: " + s2);
>         double d2 = j2.distance("", "");
>         System.out.println("javastringsimilarity Jaccard distance: " + d2);
>         
>         System.out.println("---");
>         
>         org.apache.commons.text.similarity.JaccardSimilarity j3_1 = new 
> org.apache.commons.text.similarity.JaccardSimilarity();
>         double s3 = j3_1.apply("", "");
>         System.out.println("commons-text Jaccard similarity: " + s3);
>         org.apache.commons.text.similarity.JaccardDistance j3_2 = new 
> org.apache.commons.text.similarity.JaccardDistance();
>         double d3 = j3_2.apply("", "");
>         System.out.println("commons-text Jaccard distance: " + d3);
>     }
> }{code}
> Produces:
> {noformat}
> Testing jaccard sim/dis with empty strings
> ---
> Simmetrics Jaccard similarity: 1.0
> Simmetrics Jaccard distance: 0.0
> ---
> javastringsimilarity Jaccard similarity: 1.0
> javastringsimilarity Jaccard distance: 0.0
> ---
> commons-text Jaccard similarity: 0.0
> commons-text Jaccard distance: 1.0{noformat}
> We need to confirm what's the correct output for similarity and distance with 
> empty strings. And either document why we are returning what we are 
> returning, or fix it as a bug for the next release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to