mjwall commented on pull request #2293:
URL: https://github.com/apache/accumulo/pull/2293#issuecomment-930102235


   So this gets a list of the table IDs (tableIDsBefore) before scanning the 
metadata table and then gets a list of the table IDs afterwards (tableIDsAfter. 
 While scanning the metadata, we keep track of the tables we have seen.  The 
idea here is to fail if we miss an entire table, which can happen with one scan 
if if the metadata has custom splits at table boundaries.
   
   Two cases here to considers, so I want to document my assumptions for 
review/discussion
   1 - a table is added after tableIDsBefore.  This means it is added after 
grabbing the delete markers, so there should be no delete markers for the added 
table that we need remove if they are still in use.  Based on that, we don't 
care if miss scanning the metadata for the new table
   2 - a table is deleted after tableIDsBefore.  If the table has been deleted, 
we can safely remove all delete markers associated with that table.  Meaning 
scanning the metadata for the deleted table is not needed.
   
   I am worried my assumptions are wrong here, but assuming they are correct 
the logic in ensureAllTablesChecked is this.
   
   TableIDsAfter will have extra ids for tables there were added and will be 
missing tables ids for deleted tables.  So intersect tableIDsBefore and 
tableIDsAfter.  Then remove all the tables IDs we saw.  If anything is left, we 
missed scanning a table in the metadata and should throw an error and abort the 
GC cycle.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to