[ 
https://issues.apache.org/jira/browse/HBASE-24833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409167#comment-17409167
 ] 

Josh Elser commented on HBASE-24833:
------------------------------------

So, been testing this on 2.4. I can confirm that the patch seems to work as 
Stephen has described.

Recovery in the "cloud storage reattach" use-case is still pretty hard. In the 
case where we don't have ZK and WALs....
 # Stop HBase (if running)
 # Drop ${hbase.rootdir}/MasterRegion
 # Make sure ZK is empty (should be given the scenario)
 # For each table in "storage" (cloud storage), in my case "hbase:namespace" 
and "t" are my tables
  {noformat}hbase hbck -j hbase-hbck2...jar addFsRegionsMissingInMeta 
hbase:namespace default:t {noformat}
 # Restart active master (for good measure, as addFsRegions command tells you 
to)
 # Run the corresponding {{assigns}} command that the previous addFsRegions 
command tells you to
 # Profit.

I think there's some value in what Stephen has here in a patch over the current 
state (as long as we can also capture the above steps somewhere on how to 
actually recover the rest of the way).

> Bootstrap should not delete the META table directory if it's not partial
> ------------------------------------------------------------------------
>
>                 Key: HBASE-24833
>                 URL: https://issues.apache.org/jira/browse/HBASE-24833
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.3.1, 2.3.3
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>
> this issues were discussed in 
> [PR#2113|https://github.com/apache/hbase/pull/2113] as part of HBASE-24286, 
> and it is a dependencies before we solve HBASE-24286.
> The changes were introduced in [HBASE-24471 
> |https://github.com/apache/hbase/commit/4d5efec76718032a1e55024fd5133409e4be3cb8#diff-21659161b1393e6632730dcbea205fd8R70-R89]
>  that partial meta was introduced and `partial` was defined as 
> InitMetaProcedure did not succeed and INIT_META_ASSIGN_META was not completed.
> {code:java}
>   private static void writeFsLayout(Path rootDir, Configuration conf) throws 
> IOException { 
>    LOG.info("BOOTSTRAP: creating hbase:meta region"); 
>    FileSystem fs = rootDir.getFileSystem(conf); 
>    Path tableDir = CommonFSUtils.getTableDir(rootDir, 
> TableName.META_TABLE_NAME); 
>    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) { 
>      LOG.warn("Can not delete partial created meta table, continue..."); 
>    }
> {code}
> however, in the cloud use case where HFiles store on S3, WALs store on HDFS, 
> ZK data are stored within the cluster, this partial meta becomes a block when 
> cluster recreate on existing HFiles; Here, Zk data and WALs cannot be 
> retained (HDFS was associated with cloud instance and was terminated 
> together) when cluster recreates on the flushed HFiles, and existing meta are 
> always considered as partial and deleted in `INIT_META_WRITE_FS_LAYOUT` 
> during bootstrap. As a result, the recreate cluster starts with a empty meta 
> table, either the cluster hangs during the master initialization (branch-2) 
> because table states of namespace table cannot be assigned, or starts as a 
> fresh cluster without any region assigned and table opens (may need HBCK to 
> rebuild the meta).
> Potential solution suggested by Anoop
> {quote}In case of HM start and the bootstrap we create the ClusterID and 
> write to FS and then to zk and then create the META table FS layout. So in a 
> cluster recreate, we will see clusterID is there in FS and also the META FS 
> layout but no clusterID in zk. Ya seems we can use this as indication for 
> cluster recreate over existing data. In HM start, this is some thing we need 
> to check at 1st itself and track. If this mode is true, later when (if) we do 
> INIT_META_WRITE_FS_LAYOUT , we should not delete the META dir. As part of the 
> Bootstrap when we write that proc to MasterProcWal, we can include this mode 
> (boolean) info also. This is a protobuf message anyways. So even if this HM 
> got killed and restarted (at a point where the clusterId was written to zk 
> but the Meta FS layout part was not reached) we can use the info added as 
> part of the bootstrap wal entry and make sure NOT to delete the meta dir.
> {quote}
> In this JIRA, we're going to fix the `partial` definition when we found 
> cluster ID was stored in HFiles but ZK were deleted or fresh on cluster 
> creates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to