yytInHfut opened a new issue, #395:
URL: https://github.com/apache/datasketches-cpp/issues/395

   hello,
   I just benchmarked the tuple sketch within this lib, and found the "update" 
on array_of_doubles_intersection have poor performance when the size(distinct 
entries) of aod scale up.  
   
   While the Java version keeps high performance. Maybe, there is a bug? 
@AlexanderSaydakov 
   
   For example
   ```c++
   double DUM[] = {0};
   int common_size = 50000;
   
   datasketches::update_array_of_doubles_sketch::builder builder(1);
   datasketches::update_array_of_doubles_sketch sk1 = 
builder.set_lg_k(14).build();
   datasketches::update_array_of_doubles_sketch sk2 = 
builder.set_lg_k(14).build();
   
   for (int key = 0; key < 2 * common_size; key++) {
       sk1.update(key, DUM);
   }
   
   for (int key = common_size; key < 3 * common_size; key++) {
       sk2.update(key, DUM);
   }
   
   datasketches::compact_array_of_doubles_sketch sketch1 = sk1.compact();
   datasketches::compact_array_of_doubles_sketch sketch2 = sk2.compact();
   
datasketches::array_of_doubles_intersection<datasketches::array_of_doubles_union_policy>
 inter;
   
   auto startTime = std::chrono::high_resolution_clock::now();
   inter.update(sketch1);
   inter.update(sketch2);
   auto interResult = inter.get_result();
   auto endTime = std::chrono::high_resolution_clock::now();
   auto elapsedTime = 
std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime);
   // Output the union sketch estimates
   std::cout << "Intersection took " << elapsedTime.count() << " milliseconds 
to run." << std::endl;
   std::cout << "Intersection unique count estimate: " << 
interResult.get_estimate() << std::endl;
   std::cout << "Intersection unique count lower bound (95% confidence): " << 
interResult.get_lower_bound(2) << std::endl;
   std::cout << "Intersection unique count upper bound (95% confidence): " << 
interResult.get_upper_bound(2) << std::endl;
   ```
   Outputs:
   ```
   common_size: 50000
   Intersection took 13065 milliseconds to run.
   Intersection unique count estimate: 49500.2
   Intersection unique count lower bound (95% confidence): 48801.7
   Intersection unique count upper bound (95% confidence): 50208.7
   ```
   
   ```
   common_size: 500000
   Intersection took 6394 milliseconds to run.
   Intersection unique count estimate: 501315
   Intersection unique count lower bound (95% confidence): 492070
   Intersection unique count upper bound (95% confidence): 510733
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to