On Wed, Dec 13, 2017 at 1:14 AM, Apekshit Sharma <[email protected]> wrote:
> Hi folks, > > Was debugging TestTruncateTableProcedure when starting thinking about this. > (That's one mean test! What nice fault tolerant tests!) > > So the specific case: If we fail after adding new regions to meta ( > TRUNCATE_TABLE_ADD_TO_META > <https://github.com/apache/hbase/blob/master/hbase- > server/src/main/java/org/apache/hadoop/hbase/master/procedure/ > TruncateTableProcedure.java#L127>), > then on recovery, AM assumes those regions with null state as offline and > begins assigning them by itself which is wrong since truncate action is not > complete (and it'll try to assign them too on recovery, and there are locks > to avoid simultaneous assigns etc.) > Simple fix is, add regions with initial state as CLOSED. > > Then looking in other places, CreateTableProcedure seems like it should > suffer the same fate (CREATE_TABLE_ADD_TO_META > <https://github.com/apache/hbase/blob/677c1f2c635273eb823b91903dffdb > 2e587f5181/hbase-server/src/main/java/org/apache/hadoop/ > hbase/master/procedure/CreateTableProcedure.java#L104>). > Should we add region as CLOSED there too? > (Weird part is, it's not failing, looking into it) > > So the main question is, shouldn't we always add new regions to meta with > state as CLOSED? > > I've been here recently. OFFLINE? > Whatever operation is adding them will also be opening them if needed, > right? And no operation should be relying on this weird AM assumption to > complete it's half done job. > > Food for thought - Some operations adding regions are: truncate table, > create table, modify table, clone snapshot, restore snapshot. > > Can you imagine a case where not adding a new region as CLOSED makes sense? > > None. There is adding and then AMv2 takes control. These tests that double-kill to ensure each step recoverable are great, yeah, at finding dirty bugs. I think truncate table though a little silly. If you proposed killing it, I'd +1 it. St.Ack > -- Appy >
