jltqst27 opened a new issue #7444: Are there documentation on the limitation of intersection using theta sketch? URL: https://github.com/apache/incubator-druid/issues/7444 Not sure if this is the right place to ask this but I am testing set intersection on druid using theta sketch, but I am having trouble producing result with good accuracy after the second or third intersection. Below is some sample code that I used to play with theta sketch directly UpdateSketch sketch1 = UpdateSketch.builder().setLogNominalEntries(26).build(); UpdateSketch sketch2 = UpdateSketch.builder().setLogNominalEntries(26).build(); UpdateSketch sketch3 = UpdateSketch.builder().setLogNominalEntries(26).build(); UpdateSketch sketch4 = UpdateSketch.builder().setLogNominalEntries(26).build(); Set<Integer> k1 = new HashSet<>(); Set<Integer> k2 = new HashSet<>(); Set<Integer> k3 = new HashSet<>(); Set<Integer> k4 = new HashSet<>(); Random rand = new Random(); for (int key = 0; key < 1200000; key++) { int n1 = rand.nextInt(2000000); int n2 = rand.nextInt(2000000); int n3 = rand.nextInt(2000000); int n4 = rand.nextInt(2000000); sketch1.update(n1); sketch2.update(n2); sketch3.update(n3); sketch3.update(n4); k1.add(n1); k2.add(n2); k3.add(n3); k4.add(n4); } int count1_2 = 0; int count1_2_3 = 0; int count1_2_3_4 = 0; for(Integer k: k1) { if (k2.contains(k)) { count1_2 += 1; } if (k2.contains(k) && k3.contains(k)) { count1_2_3 += 1; } if (k2.contains(k) && k3.contains(k) && k4.contains(k)) { count1_2_3_4 += 1; } } Intersection intersection = SetOperation.builder().buildIntersection(); intersection.update(sketch1); intersection.update(sketch2); Sketch intersectionResult1_2 = intersection.getResult(); intersection.update(sketch3); Sketch intersectionResult1_2_3 = intersection.getResult(); intersection.update(sketch4); Sketch intersectionResult1_2_3_4 = intersection.getResult(); System.out.println(count1_2); System.out.println(count1_2_3); System.out.println(count1_2_3_4); System.out.println(intersectionResult1_2.toString()); System.out.println(intersectionResult1_2_3.toString()); System.out.println(intersectionResult1_2_3_4.toString());` I am seeing the first interaction is very accurate but the second and third interaction not so much Is there a documentation about what kind of result we can expect for intersection?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
