[GitHub] [accumulo] milleruntime commented on pull request #2368: Create GenerateSplits utility

GitBox Wed, 01 Dec 2021 10:26:48 -0800


milleruntime commented on pull request #2368:
URL: https://github.com/apache/accumulo/pull/2368#issuecomment-983938428



   @keith-turner my last update uses a DataSketches library to get the splits 
from indices and full scan, which should be more  efficient than having to read 
all rows into memory.  But in one of the tests, it does split the data 
differently. The unit test which asks for 4 splits out of the 6, was returning 
`r2, r3, r4, r5` for your algorithm, but produces `r1, r3, r5, r6` with the 
DataSketches library. Based on your comments, it sounded like you were avoiding 
selecting the first split, why would that be? And can you think of a reason why 
one set splits would be preferred over the other?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] milleruntime commented on pull request #2368: Create GenerateSplits utility

Reply via email to