milleruntime commented on pull request #2368: URL: https://github.com/apache/accumulo/pull/2368#issuecomment-983938428
@keith-turner my last update uses a DataSketches library to get the splits from indices and full scan, which should be more efficient than having to read all rows into memory. But in one of the tests, it does split the data differently. The unit test which asks for 4 splits out of the 6, was returning `r2, r3, r4, r5` for your algorithm, but produces `r1, r3, r5, r6` with the DataSketches library. Based on your comments, it sounded like you were avoiding selecting the first split, why would that be? And can you think of a reason why one set splits would be preferred over the other? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
