anoopsjohn commented on pull request #2237:
URL: https://github.com/apache/hbase/pull/2237#issuecomment-673347350
Thanks Duo.. I was about to put the flow why we do InitMeta.. In case of
the cluster recreate what @taklwu says, the WAL data itself not there as well
as zk. So as what u said, if one has to do, then they have to do 2 things
1. When the initial cluster was dropped, before that the WAL fs (HDFS backed
by managed disk in cloud) need to be backedup. Also the zk data (the meta
server location) to be backed up
2. When the cluster recreated from existing data, some tool (hbck or some
thing) need to recreate the zk node and put that old location value. Before
that it has to recover back the WAL FS data onto the new HDFS cluster.
The WAL fs data backup and restore make sense. But IMHO the zk data thing
looks another hack. Till 2.1.6 this was not needed. Even if the meta location
is not there zk, the init meta will kick in and that will create meta region
from existing data and assign to some RS. But the meta cleanup make the
existing entire data to be deleted. I think that is not good. We are adding
more things to META table these days. The NS info itself is in another CF in
META table.. There is some other discussion around adding the committed HFiles
data into META (This is not concluded but looks like we keep increasing the
responsibility of META table). So dropping all these information is not that
good.
So my thinking was to add such cluster recreate as a 1st class feature in
HBase itself. This can be used by anyone anywhere. As long as the data is
persisted, we can drop the cluster and recreate later. Now for that 2
blockers and the biggest one is this META dir delete as part of InitMetaProc.
I agree to the intent of adding that cleanup. But now if we have to have this
support also how can we avoid this delete? Recreate entire META using some
hbck options should be the very last option to think off IMO. If we can solve
other ways why not?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]