cshannon opened a new pull request, #4317:
URL: https://github.com/apache/accumulo/pull/4317

   This adds a new column to store information for tracking unsplittable 
tablets in the metadata table instead of in memory. This information can be 
used by the tablet management iterator to know if a tablet needs to split by 
checking the column and comparing it to the current metadata and avoid 
unnecessarily trying to split a tablet that can't be split. The data stored 
includes a hash of the file set and the relevant config related to splits and 
if this changes then the iterator will try and split again.
   
   The metadata is stored as Json and the configs included include split 
threshold, table max end row size, and max open files as well as the file set 
hash. If a split happens on the tablet in the future (because the metadata 
changed such a the file set and the iterator attempted to split again) the 
column will be cleaned up by fate in UpdateTablets repo.
   
   For the schema, I wasn't sure of the best approach so I just went ahead and 
created a new Split column family and unspilttable column qualifier. I thought 
it might be useful in case we wanted to store more things in the future related 
to splits in the new column family, however I can easily change the schema if 
desired or store the new metadata as a column part of an existing column family 
or just make it a stand alone column family/qualifier like the 
UserCompactionRequested  or Merged column families.
   
   I marked this as a draft for now as it still needs more tests (only one test 
was modified so far) to be added and some other clean up work plus as I said I 
wasn't sure if the schema was the best approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to