cshannon opened a new pull request, #4317: URL: https://github.com/apache/accumulo/pull/4317
This adds a new column to store information for tracking unsplittable tablets in the metadata table instead of in memory. This information can be used by the tablet management iterator to know if a tablet needs to split by checking the column and comparing it to the current metadata and avoid unnecessarily trying to split a tablet that can't be split. The data stored includes a hash of the file set and the relevant config related to splits and if this changes then the iterator will try and split again. The metadata is stored as Json and the configs included include split threshold, table max end row size, and max open files as well as the file set hash. If a split happens on the tablet in the future (because the metadata changed such a the file set and the iterator attempted to split again) the column will be cleaned up by fate in UpdateTablets repo. For the schema, I wasn't sure of the best approach so I just went ahead and created a new Split column family and unspilttable column qualifier. I thought it might be useful in case we wanted to store more things in the future related to splits in the new column family, however I can easily change the schema if desired or store the new metadata as a column part of an existing column family or just make it a stand alone column family/qualifier like the UserCompactionRequested or Merged column families. I marked this as a draft for now as it still needs more tests (only one test was modified so far) to be added and some other clean up work plus as I said I wasn't sure if the schema was the best approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
