jmalkin commented on issue #361: URL: https://github.com/apache/datasketches-cpp/issues/361#issuecomment-1518387544
We've debated this before since it seems like a reasonable thing to do. The challenge is that in most cases it will break the sketch error distribution -- and still not actually be deterministic! Determinism would depend on feeding data to the sketch in a deterministic order, and if merging across nodes also merging in a deterministic order. I think presenting data in sorted order, however, tends to be less good for the results. It would also be a Very Bad thing for every sketch to use the same seed since errors would be correlated across does, so you'd need to ensure that each node processing data has a deterministic but unique seed. There might be other gotchas, but that's what comes to mind off the top of my head. We've been quite reluctant to provide "functionality" that makes things so fragile, especially if it's in ways that most library consumers won't be aware of. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
