yytInHfut opened a new issue, #395:
URL: https://github.com/apache/datasketches-cpp/issues/395
hello,
I just benchmarked the tuple sketch within this lib, and found the "update"
on array_of_doubles_intersection have poor performance when the size(distinct
entries) of aod scale up.
While the Java version keeps high performance. Maybe, there is a bug?
@AlexanderSaydakov
For example
```c++
double DUM[] = {0};
int common_size = 50000;
datasketches::update_array_of_doubles_sketch::builder builder(1);
datasketches::update_array_of_doubles_sketch sk1 =
builder.set_lg_k(14).build();
datasketches::update_array_of_doubles_sketch sk2 =
builder.set_lg_k(14).build();
for (int key = 0; key < 2 * common_size; key++) {
sk1.update(key, DUM);
}
for (int key = common_size; key < 3 * common_size; key++) {
sk2.update(key, DUM);
}
datasketches::compact_array_of_doubles_sketch sketch1 = sk1.compact();
datasketches::compact_array_of_doubles_sketch sketch2 = sk2.compact();
datasketches::array_of_doubles_intersection<datasketches::array_of_doubles_union_policy>
inter;
auto startTime = std::chrono::high_resolution_clock::now();
inter.update(sketch1);
inter.update(sketch2);
auto interResult = inter.get_result();
auto endTime = std::chrono::high_resolution_clock::now();
auto elapsedTime =
std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime);
// Output the union sketch estimates
std::cout << "Intersection took " << elapsedTime.count() << " milliseconds
to run." << std::endl;
std::cout << "Intersection unique count estimate: " <<
interResult.get_estimate() << std::endl;
std::cout << "Intersection unique count lower bound (95% confidence): " <<
interResult.get_lower_bound(2) << std::endl;
std::cout << "Intersection unique count upper bound (95% confidence): " <<
interResult.get_upper_bound(2) << std::endl;
```
Outputs:
```
common_size: 50000
Intersection took 13065 milliseconds to run.
Intersection unique count estimate: 49500.2
Intersection unique count lower bound (95% confidence): 48801.7
Intersection unique count upper bound (95% confidence): 50208.7
```
```
common_size: 500000
Intersection took 6394 milliseconds to run.
Intersection unique count estimate: 501315
Intersection unique count lower bound (95% confidence): 492070
Intersection unique count upper bound (95% confidence): 510733
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]