patelprateek commented on issue #414:
URL: 
https://github.com/apache/datasketches-java/issues/414#issuecomment-1253037370

    intersection of any two disjoint subsets will always be zero : this is true 
regardless of duplication or universe size.
   The main idea we were exploring is how we can compute Intersection(A, B) 
cardinality approximately , since intersection(Theta_sketch(A) , 
theta_sketch(B) ) can suffer from high error rate when jaccard similarity is 
low.
   One approach compared to above was : 
   1). Intersection(A, B) cardinality approx = Cardinalty (Union(HLL(A) , 
HLL(B)) - (HLL(A) + HLL(B)) ) : This seems to perform worse as 
@AlexanderSaydakov mentioned and shared pointers to some experiment results
   The other idea we were talking about is 
   2) Intersection(A, B) cardinality approx = Cardinlaity of universe - 
Cardinalty (Union(HLL(~A) , HLL(~B))) : This can work if we are able to compute 
universe cardinality , and complement cardinality . since unions seem to work 
well with HLL , i would be interested to see how the error compares with above 
approaches when feasible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to