[GitHub] [accumulo] keith-turner commented on issue #2361: Utility to generate splits

GitBox Wed, 17 Nov 2021 14:22:25 -0800


keith-turner commented on issue #2361:
URL: https://github.com/apache/accumulo/issues/2361#issuecomment-972149849



   > And getting that index through the Rfile reader here
   
   Yeah thatis the code I was thinking about.  Looked around and found the 
following code that the tserver uses to find a single split point by inspecting 
indexes.
   
   
https://github.com/apache/accumulo/blob/f8bb900ae080fe0f54dfe04f9e1ad8c4dd2e7930/server/base/src/main/java/org/apache/accumulo/server/util/FileUtil.java#L289
   
   The code makes two passes.  First it counts the number of index entries.  
Second read through them again using a merged view of the indexes and takes the 
count/2 entry.  Could possibly do something similar for N entries.  Do one pass 
over the index data to count the entries and then another path to take every 
count/N entry.  The code above falls back to scanning the data in the rfiles 
instead of the index, would probably need to do that sometime for this use case 
also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [accumulo] keith-turner commented on issue #2361: Utility to generate splits

Reply via email to