thomasrebele commented on issue #693:
URL: 
https://github.com/apache/datasketches-java/issues/693#issuecomment-3671099867

   Thank you for your suggestion. The place where the KLL sketches are merged 
in Hive is a regular Java function, so the order of the sketches can be 
enforced. I expect the number of sketches to be merged to be less than one 
million.
   
   The Hive project also indirectly uses the KLL sketch results in the various 
`EXPLAIN` commands in the q.out files. The q.out files are compared with the 
expected version. If the results are not stable, then there's the possibility 
to mask the results. However, if the purpose is to check whether the statistics 
have been calculated correctly, then masking them does not help. Switching to 
comparing the results by allowing a certain uncertainty opens many other 
questions: how to evaluate the uncertainty? How to define the threshold when 
the comparison should fail? How to avoid the problem of flaky tests? Hive's 
test when creating a PR take several hours on a cluster, and it is quite 
annoying if an unrelated test fails due to a reason unrelated to the PR.
   
   I would even go as far to say it would be nice to have an (additional!) 
deterministic update method to facilitate the testing in Hive.
   
   Please be reassured: I want to KEEP the existing behavior. I just want to 
add another method to make it possible to allow 3rd-party libraries to 
overwrite the RNG in case of need. The javadoc of the new methods should 
clearly state that the caller is responsible to evaluate whether it is safe to 
provide their own RNG.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to