mjwall commented on pull request #2293: URL: https://github.com/apache/accumulo/pull/2293#issuecomment-930102235
So this gets a list of the table IDs (tableIDsBefore) before scanning the metadata table and then gets a list of the table IDs afterwards (tableIDsAfter. While scanning the metadata, we keep track of the tables we have seen. The idea here is to fail if we miss an entire table, which can happen with one scan if if the metadata has custom splits at table boundaries. Two cases here to considers, so I want to document my assumptions for review/discussion 1 - a table is added after tableIDsBefore. This means it is added after grabbing the delete markers, so there should be no delete markers for the added table that we need remove if they are still in use. Based on that, we don't care if miss scanning the metadata for the new table 2 - a table is deleted after tableIDsBefore. If the table has been deleted, we can safely remove all delete markers associated with that table. Meaning scanning the metadata for the deleted table is not needed. I am worried my assumptions are wrong here, but assuming they are correct the logic in ensureAllTablesChecked is this. TableIDsAfter will have extra ids for tables there were added and will be missing tables ids for deleted tables. So intersect tableIDsBefore and tableIDsAfter. Then remove all the tables IDs we saw. If anything is left, we missed scanning a table in the metadata and should throw an error and abort the GC cycle. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
