Question on HBaseFsck#checkRegionConsistency()

Stephen Jiang Fri, 27 Mar 2015 00:17:29 -0700

I am sure the following logic is a bug, but I'd like to know the rational
behind it so that I can fix it correctly.


In HBaseFsck#checkRegionConsistency(), we skip some regions that are
recently changed.  This is undesirable (at least in the situation I am
testing).

I can easily repro a problem by modifying an existing unit test -
TestHBaseFsck#testOverlapAndOrphan ()
- All unit test passed in 0 as the recently changed lagging time.  Default
is 60 seconds.  I change to default value - 60 seconds.
- then run the UT, the UT generates an orphaned HDFS region by removing
regioninfo in the dir
- the HBCK repair code creates a new region to repair the problem.
- However, it was skipped in HBaseFsck#checkRegionConsistency() and hence
the region is not assigned and added in META.
- At the end of UT, it failed because the repair did not fix the error.

{code}
private void checkRegionConsistency(final String key, final HbckInfo hbi)
    ...
    boolean recentlyModified = inHdfs && hbi.getModTime() + timelag >
System.currentTimeMillis();
    ...
    } else *if (recentlyModified) {*
*      LOG.warn("Region " + descriptiveName + " was recently modified --
skipping");*
*      return;*
    }
    ...
}
{code}

If I changed the timelag from 0 to 60 seconds (default value), run UTs in
TestHBaseFsck.  A lot of UT fails.  I think this is a valid customer
scenario - people usually not change default value unless they know what
they are doing.
(Surpriselly, I could not find any complains from google search.  Maybe
HBASE is so reliable that we never had some particular corruption in
production :-)
- note: the workaround is to run hbck/repair twice; the second run would
fix this issue - maybe our customer just always run the hbck multiple times
before reporting issues).

I have not go back to history and find why this logic was implemented in
the first place.  Does anyone in this list knows the logic behind (should I
simply remove it? or I need to add some information in hbi to indicate that
we should not skip a target region)?

Thanks
Stephen

Question on HBaseFsck#checkRegionConsistency()

Reply via email to