Hi folks, Was debugging TestTruncateTableProcedure when starting thinking about this. (That's one mean test! What nice fault tolerant tests!)
So the specific case: If we fail after adding new regions to meta ( TRUNCATE_TABLE_ADD_TO_META <https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/TruncateTableProcedure.java#L127>), then on recovery, AM assumes those regions with null state as offline and begins assigning them by itself which is wrong since truncate action is not complete (and it'll try to assign them too on recovery, and there are locks to avoid simultaneous assigns etc.) Simple fix is, add regions with initial state as CLOSED. Then looking in other places, CreateTableProcedure seems like it should suffer the same fate (CREATE_TABLE_ADD_TO_META <https://github.com/apache/hbase/blob/677c1f2c635273eb823b91903dffdb2e587f5181/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/CreateTableProcedure.java#L104>). Should we add region as CLOSED there too? (Weird part is, it's not failing, looking into it) So the main question is, shouldn't we always add new regions to meta with state as CLOSED? Whatever operation is adding them will also be opening them if needed, right? And no operation should be relying on this weird AM assumption to complete it's half done job. Food for thought - Some operations adding regions are: truncate table, create table, modify table, clone snapshot, restore snapshot. Can you imagine a case where not adding a new region as CLOSED makes sense? -- Appy
