[GitHub] [datasketches-cpp] jmalkin commented on issue #361: Determinism

via GitHub Fri, 21 Apr 2023 15:21:07 -0700


jmalkin commented on issue #361:
URL: 
https://github.com/apache/datasketches-cpp/issues/361#issuecomment-1518387544


   We've debated this before since it seems like a reasonable thing to do. The 
challenge is that in most cases it will break the sketch error distribution -- 
and still not actually be deterministic!
   
   Determinism would depend on feeding data to the sketch in a deterministic 
order, and if merging across nodes also merging in a deterministic order. I 
think presenting data in sorted order, however, tends to be less good for the 
results. It would also be a Very Bad thing for every sketch to use the same 
seed since errors would be correlated across does, so you'd need to ensure that 
each node processing data has a deterministic but unique seed.
   
   There might be other gotchas, but that's what comes to mind off the top of 
my head. We've been quite reluctant to provide "functionality" that makes 
things so fragile, especially if it's in ways that most library consumers won't 
be aware of.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [datasketches-cpp] jmalkin commented on issue #361: Determinism

Reply via email to