cshannon opened a new pull request, #3385: URL: https://github.com/apache/accumulo/pull/3385
The goal of this PR is to start to move away from using a String representation everywhere for the file path reference in the metadata table in order to support easier changes in the future when adding more information besides just a path. This is to support the future changes #1327 The current plan for supporting no chop merges is to start associating a range with a file and treating each file that is fenced off by a range as unique. This is going to lead to storing more than one metadata entry for the same file if there is multiple ranges so we will need to start handling the combination of the file reference and Range in the code. This PR is an incremental step and introduces an object to encapsulate the file metadata instead of using a String everywhere. There is a new object that is now used as part of `TabletFile` and `StoredTabletFile` called `TabletFileMetadataEntry` which will now represent the value that is stored in the file column. Currently the only thing it has is just a file path like before but now with the addition of this class we can add a field to it (a range or other information) and it will be much easier to update since we are not using a String everywhere. This PR doesn't go too crazy yet, it introduces the new object and updates the `StoredTabletFile` and `TabletFile` objects but many places in the code still just call the [getter](https://github.com/cshannon/accumulo/blob/991fe838c059e5f2675e4e89c2d8a33b80e58e9e/core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletFileMetadataEntry.java#L55) on the new object to get the string representation. Besides a [new constructor](https://github.com/cshannon/accumulo/blob/991fe838c059e5f2675e4e89c2d8a33b80e58e9e/core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java#L44) for the new object type, a string is still allowed to be passed to a [constructor](https://github.com/cshannon/accumulo/blob/991fe838c059e5f2675e4e89c2d8a33b80e58e9e/core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java#L53) for StoredTabletFile and then just gets converted to the new object but we could stop doing that in the future to prevent errors. I wasn't sure how f ar to take this in this PR. We could go much further with more changes (at a higher risk) and push the object as far as we can and stop using Strings as much as possible. The main issue with doing that now is it would cause a lot of cascading changes to the code from everything to file handling to external compactions, basically anything using a file reference and I'm not quite sure yet how things are going to progress with the no chop merge changes. A notable example would be changes to in `FileManager`. Instead of [tracking](https://github.com/apache/accumulo/blob/56d49f15a05db9a46dbceb845918497760601c11/server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java#L484) Strings for the files when range information is added we will need to track the new object `TabletFileMetadataEntry`. However, I think that the changes for that may be better done in my upcoming PR for adding range information because I haven't decided yet if I plan to use `TabletFileMetadataEntry` or actually just `TabletFile` itself. It might make more sense to just track `TabletFiles` (which will now contain `TabletFileMetadataEntry`). Anyways, I think we could probably merge this into 3.0 but could also wait to 3.1. Either way I plan to make my other no-chop merge PRs based on this and continue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
