keith-turner opened a new pull request, #5833: URL: https://github.com/apache/accumulo/pull/5833
Attempting to split a tablet that had files that did not have data for the tablet would cause an error. There were two bugs. First bug was the splits code would fail if a file went to zero child tablets. Second bug was if a file had a fence range that was disjoint from data in the file, then the FencedRFile code would fail. This happened be cause the code would compute a range where the start was after the end. Both of these situations can occur over time with concurrent splits, merges, and bulk imports. For example the following could happen. 1. bulk import calculates tablets tha files go to 2. split add more tablets 3. bulk import adds files to the ranges it calculated before the split happened. This could result in a tablet pointing to a file that has no data for it. 4. Tablets are merged and fence ranges are added. If the file has no data in the tablet range, then the fence range will be disjoint w/ the range of data in the file. To fix this a new FileRange class was added that represents a tablet range or an empty range. This code replaces two method for getting a files first and last row that returned null when the file was empty. The null was really confusing, explicitly representing empty in the class makes the code easier to understand. Using this new FileRange class, the split code and fenced rfile code were fixed. These problems were found when running the bulk randomwalk test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@accumulo.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org