cshannon opened a new pull request, #3418: URL: https://github.com/apache/accumulo/pull/3418
This allows treating each fenced range of an RFile as a separate TabletFile for reading purposes. This PR is part of #1327 and the latest attempt to add fencing to an RFile. The changes here build off of the changes in #3401 by adding a Range to `AbstractTabletFile`. This allows `RFileOperations` to easily access the Range and wrap the Reader inside a FencedReader. The idea here is to associate a range/fence with a TabletFile so that we can easily treat the combination of an RFile and fence as a unique file which means less changes to the rest of the code base when we have multiple ranges for a single file as the code just thinks they are unique files. For more information see the comment [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509174746) and [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509338370). So, for example, if we had 5 ranges defined for an RFile we'd load up 5 "files" that were fenced off by each range and the rest of the code would just get a list of 5 readers and wouldn't know that they were actually the same file and wouldn't care when iterating. The 5 fenced files (that are really just subsets of the same file) are treated identical by everywhere else in the code as 5 unique files. One thing to note is that inside FileManager we track reserved readers by TabletFile so each unique range for the same file would get its own reader in the cache. This should be fine as we want to treat them as unique and actual file data on disk is still cached by the block cache and won't be duplicated if multiple ranges. We still want to limit the number iterators/scans at one time even if it's the same file. In fact, this isn't new as we already do this. FileManager previously already supported readers for the same file in case there are multiple concurrent reads, this just now also supports another way to have a reference to the same file. I marked this as a work in progress for now as I wasn't sure how much to do update in this PR vs future PRs. The main purpose of this PR is just to add the fencing iterator but I also updated FileOperations and RFileScanner to use it just to demonstrate it works. PR includes the following: 1. An iterator to fence off an RFile by range 2. An iterator to also fence off an RFile index 3. There is a test class that demonstrates the fencing called FencedRFileTest 4. RFileScanner was updated so clients can also pass a range for an RFile. The matching classes (RFileScannerBuilder, etc) were updated as well. Two tests were added to demonstrate fencing in RFileClientTest. One demonstrates using the client Scanner and the other uses FileOperations. The FileOperations test probably belong somewhere else but this was mostly just to demonstrate it works. Note that a RFileScanner for clients already takes a range but that range is an overall range across multiple files where as this shows passing a range per RFile. Ultimately we may decide we don't need to fence RFileScanner but it demonstrates we can if we want to. There is more work to do in this PR and/or follow on PRs: 1. Make sure all places that need to fence can read an RFile also pass in a range for the fenced iterator 2. Add some tests to verify the changes in RFileOperations and FileManager work when opening a ranged file 3. I will create a separate PR to handle writing the new DataFileValue metadata which will include adding a range to the CQ and storing a separate DFV for each combination of file and range. 4. After we can persist the ranges we need to update everywhere that uses a reader to be able to pass in ranges (compaction, scanners, etc) and it would be good to have some ITs to show metadata table changes can contain ranges and be read and fence off files 5. Actually update the merge code to use all the changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
