[GitHub] [accumulo] keith-turner commented on pull request #2368: Create GenerateSplits utility

GitBox Wed, 01 Dec 2021 11:27:28 -0800


keith-turner commented on pull request #2368:
URL: https://github.com/apache/accumulo/pull/2368#issuecomment-983983979



   > Based on your comments, it sounded like you were avoiding selecting the 
first split, why would that be? 
   
   That comment was based on a bug in the first cut of the selection alogrithm 
which always used the first source row.  For larger amounts of data this would 
not be desirable.  If there are 10,000 source rows from which you want to 
derive 100 splits, would not want to use the first source row as the first 
split.   Would want the ~100th source row for the first split.  For the case 
you are testing w/ 6 source rows and 4 desired split, it probably does not 
matter if the first source row is chosen.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] keith-turner commented on pull request #2368: Create GenerateSplits utility

Reply via email to