[ 
https://issues.apache.org/jira/browse/HBASE-27698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709991#comment-17709991
 ] 

Rajeshbabu Chintaguntla edited comment on HBASE-27698 at 4/9/23 8:09 PM:
-------------------------------------------------------------------------

[~vjasani] [~taklwu] 
During express upgrade mainly from HBase 1.x to HBase 2.x as I have mentioned 
in the above comment region states info for hbase:meta won't be identified 
properly as neither the zookeeper not WALs files have the meta location because 
no RS holding the meta location then it would be better to proceed with the 
meta assignment instead of throwing exception when the hbase:meta directory is 
not partial.

With this change of not throwing exception(Raised PR 
[https://github.com/apache/hbase/pull/5167]) in case of hbase:meta is not 
partial, helps to initialise the meta properly without any further rebuilding 
or ZooKeeper znode creation. Verified upgrade path from HBase 1.x to 2.5.2 once 
in a cluster where master need to be restarted only once post updating meta 
table schema. After starting the master all the tables and regions came up 
without any issues.
{noformat}
2023-04-04 19:41:13,013 INFO  [master/host023:16000:becomeActiveMaster] 
hbase.ChoreService: Chore ScheduledChore name=SnapshotCleaner, period=1800000, 
unit=MILLISECONDS is enabled.
2023-04-04 19:41:13,077 WARN  [PEWorker-10] procedure.InitMetaProcedure: Can 
not delete partial created meta table, continue...
2023-04-04 19:41:13,093 INFO  [PEWorker-10] regionserver.HRegion: creating 
{ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}, 
tableDescriptor='hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', 
coprocessor$1 => 
'|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}}, 
{NAME => 'info', INDEX_BLOCK_ENCODING => 'NONE', VERSIONS => '10', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', IN_MEMORY 
=> 'true', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '8192 B 
(8KB)', METADATA => {'CACHE_DATA_IN_L1' => 'true'}}, 
regionDir=hdfs://host023:8020/apps/hbase/data
2023-04-04 19:41:13,098 WARN  [PEWorker-10] regionserver.HRegionFileSystem: 
Trying to create a region that already exists on disk: 
hdfs://host023:8020/apps/hbase/data/data/hbase/meta/1588230740
{noformat}
{noformat}
org.apache.hadoop.hbase.PleaseRestartMasterException: Aborting active master 
after missing CFs are successfully added in meta. Subsequent active master 
initialization should be uninterrupted
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2394)
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:563)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:560)
        at java.lang.Thread.run(Thread.java:750)
2023-04-04 19:41:51,837 ERROR [master/sl73caehmapd023:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: []
2023-04-04 19:41:51,837 ERROR [master/sl73caehmapd023:16000:becomeActiveMaster] 
master.HMaster: ***** ABORTING master 
sl73caehmapd023.visa.com,16000,1680637260893: Unhandled exception. Starting 
shutdown. *****
org.apache.hadoop.hbase.PleaseRestartMasterException: Aborting active master 
after missing CFs are successfully added in meta. Subsequent active master 
initialization should be uninterrupted
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2394)
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:563)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:560)
        at java.lang.Thread.run(Thread.java:750)
{noformat}


was (Author: rajeshbabu):
[~vjasani] 
During express upgrade mainly from HBase 1.x to HBase 2.x as I have mentioned 
in the above comment region states info for hbase:meta won't be identified 
properly as neither the zookeeper not WALs files have the meta location because 
no RS holding the meta location then it would be better to proceed with the 
meta assignment instead of throwing exception when the hbase:meta directory is 
not partial.

With this change of not throwing exception(Raised PR 
https://github.com/apache/hbase/pull/5167) in case of hbase:meta is not 
partial, helps to initialise the meta properly without any further rebuilding 
or ZooKeeper znode creation. Verified upgrade path from HBase 1.x to 2.5.2 once 
in a cluster where master need to be restarted only once post updating meta 
table schema. After starting the master all the tables and regions came up 
without any issues.
{noformat}
2023-04-04 19:41:13,013 INFO  [master/host023:16000:becomeActiveMaster] 
hbase.ChoreService: Chore ScheduledChore name=SnapshotCleaner, period=1800000, 
unit=MILLISECONDS is enabled.
2023-04-04 19:41:13,077 WARN  [PEWorker-10] procedure.InitMetaProcedure: Can 
not delete partial created meta table, continue...
2023-04-04 19:41:13,093 INFO  [PEWorker-10] regionserver.HRegion: creating 
{ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''}, 
tableDescriptor='hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', 
coprocessor$1 => 
'|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}}, 
{NAME => 'info', INDEX_BLOCK_ENCODING => 'NONE', VERSIONS => '10', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'NONE', IN_MEMORY 
=> 'true', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '8192 B 
(8KB)', METADATA => {'CACHE_DATA_IN_L1' => 'true'}}, 
regionDir=hdfs://host023:8020/apps/hbase/data
2023-04-04 19:41:13,098 WARN  [PEWorker-10] regionserver.HRegionFileSystem: 
Trying to create a region that already exists on disk: 
hdfs://host023:8020/apps/hbase/data/data/hbase/meta/1588230740
{noformat}
{noformat}
org.apache.hadoop.hbase.PleaseRestartMasterException: Aborting active master 
after missing CFs are successfully added in meta. Subsequent active master 
initialization should be uninterrupted
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2394)
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:563)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:560)
        at java.lang.Thread.run(Thread.java:750)
2023-04-04 19:41:51,837 ERROR [master/sl73caehmapd023:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: []
2023-04-04 19:41:51,837 ERROR [master/sl73caehmapd023:16000:becomeActiveMaster] 
master.HMaster: ***** ABORTING master 
sl73caehmapd023.visa.com,16000,1680637260893: Unhandled exception. Starting 
shutdown. *****
org.apache.hadoop.hbase.PleaseRestartMasterException: Aborting active master 
after missing CFs are successfully added in meta. Subsequent active master 
initialization should be uninterrupted
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2394)
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$null$0(HMaster.java:563)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:187)
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:177)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$1(HMaster.java:560)
        at java.lang.Thread.run(Thread.java:750)
{noformat}

> Migrate meta locations from zookeeper to master data may not always possible 
> if we migrate from 1.x HBase
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27698
>                 URL: https://issues.apache.org/jira/browse/HBASE-27698
>             Project: HBase
>          Issue Type: Bug
>          Components: migration
>    Affects Versions: 2.5.0
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>
> In HBase 1.x versions meta server location from zookeeper will be removed 
> when the server stopped. In such cases migrating to 2.5.x branches may not 
> create any meta entries in master data. So in case if we could not find the 
> meta location from zookeeper we can get meta location from wal directories 
> with .meta extension and add to master data.
> {noformat}
>   private void tryMigrateMetaLocationsFromZooKeeper() throws IOException, 
> KeeperException {
>     // try migrate data from zookeeper
>     try (ResultScanner scanner =
>       masterRegion.getScanner(new 
> Scan().addFamily(HConstants.CATALOG_FAMILY))) {
>       if (scanner.next() != null) {
>         // notice that all replicas for a region are in the same row, so the 
> migration can be
>         // done with in a one row put, which means if we have data in catalog 
> family then we can
>         // make sure that the migration is done.
>         LOG.info("The {} family in master local region already has data in 
> it, skip migrating...",
>           HConstants.CATALOG_FAMILY_STR);
>         return;
>       }
>     }
>     // start migrating
>     byte[] row = 
> CatalogFamilyFormat.getMetaKeyForRegion(RegionInfoBuilder.FIRST_META_REGIONINFO);
>     Put put = new Put(row);
>     List<String> metaReplicaNodes = zooKeeper.getMetaReplicaNodes();
>     StringBuilder info = new StringBuilder("Migrating meta locations:");
>     for (String metaReplicaNode : metaReplicaNodes) {
>       int replicaId = 
> zooKeeper.getZNodePaths().getMetaReplicaIdFromZNode(metaReplicaNode);
>       RegionState state = MetaTableLocator.getMetaRegionState(zooKeeper, 
> replicaId);
>       info.append(" ").append(state);
>       put.setTimestamp(state.getStamp());
>       MetaTableAccessor.addRegionInfo(put, state.getRegion());
>       if (state.getServerName() != null) {
>         MetaTableAccessor.addLocation(put, state.getServerName(), 
> HConstants.NO_SEQNUM, replicaId);
>       }
>       
> put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow())
>         .setFamily(HConstants.CATALOG_FAMILY)
>         
> .setQualifier(RegionStateStore.getStateColumn(replicaId)).setTimestamp(put.getTimestamp())
>         
> .setType(Cell.Type.Put).setValue(Bytes.toBytes(state.getState().name())).build());
>     }
>     if (!put.isEmpty()) {
>       LOG.info(info.toString());
>       masterRegion.update(r -> r.put(put));
>     } else {
>       LOG.info("No meta location available on zookeeper, skip migrating...");
>     }
>   }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to