ZacBlanco commented on PR #554: URL: https://github.com/apache/datasketches-java/pull/554#issuecomment-2093460711
> Is it fair to assume that the QPO task is always performed prior to any UQ? In other words, you have the opportunity to learn a lot about the user's tables prior to the user using them. It's not necessarily the case that we always have the statistics for QPO before UQ, though we recommend most users spend the resources to do so as it can make the queries execute quicker. It depends on the use case though. I think we would be okay with a heuristic. I think your suggestion of calculating a bound for the memory would work for us. I think collecting a sketch of data sizes alongside the actual sketch could also be a feasible solution that would give us the memory bounds even if the sketch size is unknown and would be less overhead than the current approach. I appreciated you sharing the graphs of the sketch size for a given N. However, I couldn't identify a given formula in the KLL paper[^1] which corresponds to the same types of curves you shared. Could you help point me in the right direction so I could incorporate them? Given that I can compute a rough bound on the size of the sketch, I think that parts of this PR would not be necessary. However, I do believe that the addition to the `ArrayOfItemsSerde` interface of `isFixedWidth` would still be beneficial for `KllItemsSketch` to improve the performance of `getSerializedSizeBytes()`. I can modify the PR to include only that part if you agree with that. > One more question I wanted to ask you is what Java version are you using? We are in the planning stages to move to Java 17 and 21, but we will have to draw a hard line in the sand for Java 17. In other words, at a specific DataSketches Version (perhaps version 7 or 8), Java 17+ will be required. If you will still need Java versions < 17, you will have to use earlier versions of the library. How will this impact you? Currently our build actually relies on Java 8. We are moving to Java 11 soon. Personally, I would like our project to move to more modern versions, but I don't have enough influence to make that happen. It's also complex decision for us to make as many companies use [our system](https://github.com/prestodb/presto). If you move Java versions upwards we would either rely on patch builds to a version before 17, or would just wait until we move to a later Java version to incorporate new versions of the datasketches library. [^1]: https://arxiv.org/pdf/1603.05346v2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
