taklwu commented on pull request #2237:
URL: https://github.com/apache/hbase/pull/2237#issuecomment-673253206
first of all, thanks Duo again.
> I think for the scenario here, we just need to write the cluster id and
other things to zookeeper? Just make sure that the current code in HBase will
not consider us as a fresh new cluster. We do not need to rebuild meta?
So, let me confirm your suggestion, that means if we add one more field in
ZNode, e.g. a boolean `completedMetaBoostrap`, if we find both `clusterId` and
`completedMetaBoostrap` in ZK, we will not delete meta directory ?
followup if ZK Znode data is used to determine if this is a fresh new
cluster, can we skip the delete meta directory if `clusterId` and
`completedMetaBoostrap` are never set but we found meta directory? this is the
cloud use cases which we don't have ZK to make the decision; such we don't know
if the meta is partial, and IMO, we should just leave the meta directory and if
anything bad happens, the operator can still run HBCK. (if we do the other way
around and always delete the meta, then we're losing the possibility the
cluster can heal itself, and we cannot confirm if this is partial, doesn't it?)
> For the InitMetaProcedure, the assumption is that, if we found that the
meta table directory is there, then it means the procedure itself has crashed
before finishing the creation of meta table, i.e, the meta table is 'partial'.
So it is safe to just remove it and create again. I think this is a very common
trick in distributed system for handling failures?
do you mean `idempotent` is trick ? `InitMetaProcedure` may be idempotent
and can make `hbase:meta` online (as a empty table), but I don't think if the
cluster/HM itself is `idempotent` automatically; and yeah, it can rebuild the
data content of the original meta with the help of HBCK, but just if HM
continues the flow with some existing data, e.g. the namespace table (sorry for
branch-2 we have namespace table) and HM restart with a empty meta, based on
the experiment I did, the cluster hangs and HM cannot be initialized.
if we step back to just think on the definition of `partial` meta, it would
be great if the meta table itself can tell if it's partial, because it's still
a table in HBase and HFiles are immutable. e.g. can we tell if a user table is
partial by looking at its data? I may be wrong, but it seems like we're not
able to tell from HFiles itself, and we need ZK and WAL to define it.
So, again, IMO data content in a table is sensitive especially the meta
table, I'm proposing not to delete meta if possible here (it's also like
running a hbck to delete and rebuild).
Based on our discussion here, IMO we have two proposal mentioned to define
`partial meta` .
1. add a boolean in WAL like a proc-level data
2. write a boolean in ZNode to tell if the bootstrap completes
*. no matter we choose 1) and 2) above, we have an additional condition, if
we don't find any WAL or ZK about this condition, we should not delete the meta
table.
seems 2) + *) should be the simplest solution, what do you guys think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]