ZacBlanco commented on PR #554:
URL: 
https://github.com/apache/datasketches-java/pull/554#issuecomment-2093460711

   > Is it fair to assume that the QPO task is always performed prior to any 
UQ? In other words, you have the opportunity to learn a lot about the user's 
tables prior to the user using them.
   
   It's not necessarily the case that we always have the statistics for QPO 
before UQ, though we recommend most users spend the resources to do so as it 
can make the queries execute quicker. It depends on the use case though. I 
think we would be okay with a heuristic.
   
   I think your suggestion of calculating a bound for the memory would work for 
us. I think collecting a sketch of data sizes alongside the actual sketch could 
also be a feasible solution that would give us the memory bounds even if the 
sketch size is unknown and would be less overhead than the current approach. I 
appreciated you sharing the graphs of the sketch size for a given N. However, I 
couldn't identify a given formula in the KLL paper[^1] which corresponds to the 
same types of curves you shared. Could you help point me in the right direction 
so I could incorporate them?
   
   Given that I can compute a rough bound on the size of the sketch, I think 
that parts of this PR would not be necessary. However, I do believe that the 
addition to the `ArrayOfItemsSerde` interface of `isFixedWidth` would still be 
beneficial for `KllItemsSketch` to improve the performance of 
`getSerializedSizeBytes()`. I can modify the PR to include only that part if 
you agree with that.
   
   > One more question I wanted to ask you is what Java version are you using? 
We are in the planning stages to move to Java 17 and 21, but we will have to 
draw a hard line in the sand for Java 17. In other words, at a specific 
DataSketches Version (perhaps version 7 or 8), Java 17+ will be required. If 
you will still need Java versions < 17, you will have to use earlier versions 
of the library. How will this impact you?
   
   Currently our build actually relies on Java 8. We are moving to Java 11 
soon. Personally, I would like our project to move to more modern versions, but 
I don't have enough influence to make that happen. It's also complex decision 
for us to make as many companies use [our 
system](https://github.com/prestodb/presto). If you move Java versions upwards 
we would either rely on patch builds to a version before 17, or would just wait 
until we move to a later Java version to incorporate new versions of the 
datasketches library.
   
   [^1]: https://arxiv.org/pdf/1603.05346v2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to