keith-turner opened a new pull request, #5833:
URL: https://github.com/apache/accumulo/pull/5833

   Attempting to split a tablet that had files that did not have data for the 
tablet would cause an error.  There were two bugs.  First bug was the splits 
code would fail if a file went to zero child tablets.  Second bug was if a file 
had a fence range that was disjoint from data in the file, then the FencedRFile 
code would fail.  This happened be cause the code would compute a range where 
the start was after the end.
   
   Both of these situations can occur over time with concurrent splits, merges, 
and bulk imports. For example the following could happen.
   
    1. bulk import calculates tablets tha files go to
    2. split add more tablets
    3. bulk import adds files to the ranges it calculated before the split 
happened.  This could result in a tablet pointing to a file that has no data 
for it.
    4. Tablets are merged and fence ranges are added.  If the file has no data 
in the tablet range, then the fence range will be disjoint w/ the range of data 
in the file.
   
   To fix this a new FileRange class was added that represents a tablet range 
or an empty range.  This code replaces two method for getting a files first and 
last row that returned null when the file was empty. The null was really 
confusing, explicitly representing empty in the class makes the code easier to 
understand.
   
   Using this new FileRange class, the split code and fenced rfile code were 
fixed.
   
   These problems were found when running the bulk randomwalk test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to