AlexanderSaydakov commented on issue #414:
URL: 
https://github.com/apache/datasketches-java/issues/414#issuecomment-1252897851

   Yes, I believe that Theta sketch is state of the art for approximate 
intersections.
   The problem is not quite that "cardinality difference is high". A better way 
to describe this is that Jaccard similarity is very low. It can happen with 
intersection of large sets with very small overlap too. I am afraid that this 
is a fundamental problem with approximate set operations. You could try 
improving accuracy by increasing sketch size, which sort of brings you closer 
to brute-force "exact" solution.
   On the other hand, if the overlap of two sets is very small (orders of 
magnitude smaller than the sets), is it really so important that relative 
accuracy is bad? Say, intersection of two sets with billion items is one 
hundred items. Even if the answer is 100% off (say, true answer is 50). Ask 
yourself whether it is a problem in practice? How much would you pay to have 
better accuracy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to