ctubbsii opened a new issue, #3651: URL: https://github.com/apache/accumulo/issues/3651
DataSketches is useful for precomputing various distribution statistics of data read exactly once. If we use it when we write a file, we could pre-compute things and store it in the file metadata to help make split point computation faster. In order for this to be useful, we would need to make sure we could aggregate the pre-computed statistics across locality groups within a file and across files, so calculation of approximate midpoints can be done very efficiently, only needing to read this pre-computed data to find a suitable midpoint when automatically splitting tablets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
