milleruntime commented on issue #2361:
URL: https://github.com/apache/accumulo/issues/2361#issuecomment-973228335
> when a tablet is only using a sub range of an rfile and that subrange
falls between index entries.
OK so if I am looking at the entire rfile and not just a range, then maybe I
don't have to worry about this case. The use case I was thinking of was where
the user just has a file (or set of files) and want the splits across the whole
file.
Also, I thought I could fall back calculating the splits just based on the
size of the file (to prevent having to scan the file twice) but it doesn't look
like that works.
<pre>
long fileSize = fs.getFileStatus(file).getLen();
long splitSize = fileSize / numSplits;
...
int size = key.getSize() + val.getSize();
count += size;
if (count > splitSize) {
splits.add(stripOffEmptyByte(key.getRow()));
}
</pre>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]