milleruntime commented on issue #2361:
URL: https://github.com/apache/accumulo/issues/2361#issuecomment-973228335


   > when a tablet is only using a sub range of an rfile and that subrange 
falls between index entries.
   
   OK so if I am looking at the entire rfile and not just a range, then maybe I 
don't have to worry about this case. The use case I was thinking of was where 
the user just has a file (or set of files) and want the splits across the whole 
file.
   
   Also, I thought I could fall back calculating the splits just based on the 
size of the file (to prevent having to scan the file twice) but it doesn't look 
like that works.
   <pre>
   long fileSize = fs.getFileStatus(file).getLen();
   long splitSize = fileSize / numSplits;
   ...
   int size = key.getSize() + val.getSize();
   count += size;
   if (count > splitSize) {
       splits.add(stripOffEmptyByte(key.getRow()));
   }
   </pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to