I found a very critical issue when running ITBLL against branch-3 and
it affects all active branches.

We will create the WAL directory when initializing the WAL instance,
and since now we have some lazy initialized WALProviders, like
WALProvider for meta table, it may break our fencing when force
killing a region server.

Our way of fencing at the master side is to rename the WAL directory
of the given region server, so when the 'dead' region server wants to
roll the WAL, it will get a 'parent does not exist' error and quit.
But if we just want to move the meta region to this region server, the
newly initialized meta WAL instance will recreate the WAL directory
for the given region server, so the WAL rolling could succeed and
cause very serious data inconsistency problems...

The fix is easy, just remove the creation of WAL directory from WAL
initialization, and I've already opened a PR. The biggest challenge is
to fix the broken UTs, so we still need some time.

Since this problem affects all active branches, I suggest we make new
releases for 2.6.x and 2.5.x immediately after fixing this issue.

Thoughts? Thanks.

Reply via email to