[
https://issues.apache.org/jira/browse/HBASE-22193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812981#comment-16812981
]
Allan Yang commented on HBASE-22193:
------------------------------------
We set ASSIGN_MAX_ATTEMPTS to Int.Max because we can't let SCP fail, since
failed, the region will remain unassigned forever(unless using HBCK2 tool to
schedule a SCP again), as for the procedure log problem, I think we have
considered that, before we changed ASSIGN_MAX_ATTEMPTS to In.Max, we asked
[~Apache9] about whether purging the procedure log can work, it should not
left so many un-deleted log here even lots of retries. Is it a bug here,
[~Apache9]?
> Reduce the default ASSIGN_MAX_ATTEMPTS config
> ---------------------------------------------
>
> Key: HBASE-22193
> URL: https://issues.apache.org/jira/browse/HBASE-22193
> Project: HBase
> Issue Type: Improvement
> Reporter: Guanghao Zhang
> Priority: Major
>
>
> {code:java}
> public static final String ASSIGN_MAX_ATTEMPTS =
> "hbase.assignment.maximum.attempts";
> private static final int DEFAULT_ASSIGN_MAX_ATTEMPTS = Integer.MAX_VALUE;
> {code}
> Now the default config is Integer.MAX_VALUE.
>
> {code:java}
> 2019-04-09,10:50:44,921 INFO
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure:
> Retry=170813 of max=2147483647; pid=2849, ppid=2846,
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true;
> TransitRegionStateProcedure table=IntegrationTestBigLinkedList,
> region=634feb79a583480597e1843647d11228, REOPEN/MOVE; rit=OPENING,
> location=c4-hadoop-tst-st26.bj,29100,1554262369262
> {code}
> The ITBLL failed to open the region as HBASE-22163 and retry 170813 to
> reopen. After I fixed the problem and restart master, I found it need take a
> long time to init the old procedure logs because there are too many old
> logs...
> Code in WALProcedureStore,java.
>
> {code:java}
> private long initOldLogs(FileStatus[] logFiles) throws IOException {
> if (logFiles == null || logFiles.length == 0) {
> return 0L;
> }
> long maxLogId = 0;
> for (int i = 0; i < logFiles.length; ++i) {
> final Path logPath = logFiles[i].getPath();
> leaseRecovery.recoverFileLease(fs, logPath);
> if (!isRunning()) {
> throw new IOException("wal aborting");
> }
> maxLogId = Math.max(maxLogId, getLogIdFromName(logPath.getName()));
> ProcedureWALFile log = initOldLog(logFiles[i], this.walArchiveDir);
> if (log != null) {
> this.logs.add(log);
> }
> }
> initTrackerFromOldLogs();
> return maxLogId;
> }
> {code}
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)