ctubbsii opened a new issue, #3651:
URL: https://github.com/apache/accumulo/issues/3651

   DataSketches is useful for precomputing various distribution statistics of 
data read exactly once. If we use it when we write a file, we could pre-compute 
things and store it in the file metadata to help make split point computation 
faster. In order for this to be useful, we would need to make sure we could 
aggregate the pre-computed statistics across locality groups within a file and 
across files, so calculation of approximate midpoints can be done very 
efficiently, only needing to read this pre-computed data to find a suitable 
midpoint when automatically splitting tablets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to