[GitHub] [accumulo] ctubbsii opened a new issue, #3651: Consider using DataSketches to precompute quantiles or other values to aid with more rapid split point computation

via GitHub Mon, 24 Jul 2023 11:18:49 -0700


ctubbsii opened a new issue, #3651:
URL: https://github.com/apache/accumulo/issues/3651


   DataSketches is useful for precomputing various distribution statistics of 
data read exactly once. If we use it when we write a file, we could pre-compute 
things and store it in the file metadata to help make split point computation 
faster. In order for this to be useful, we would need to make sure we could 
aggregate the pre-computed statistics across locality groups within a file and 
across files, so calculation of approximate midpoints can be done very 
efficiently, only needing to read this pre-computed data to find a suitable 
midpoint when automatically splitting tablets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] ctubbsii opened a new issue, #3651: Consider using DataSketches to precompute quantiles or other values to aid with more rapid split point computation

Reply via email to