[jira] [Commented] (HBASE-23694) After RegionProcedureStore completes migration of WALProcedureStore, still running WALProcedureStore.syncThread keeps trying to delete now inexistent log files.

Hudson (Jira) Thu, 16 Jan 2020 09:51:40 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-23694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017354#comment-17017354
 ]


Hudson commented on HBASE-23694:
--------------------------------

Results for branch master
        [build #1599 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1599/]: (x) 
*{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1599//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1599//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/1599//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> After RegionProcedureStore completes migration of WALProcedureStore, still 
> running WALProcedureStore.syncThread keeps trying to delete now inexistent 
> log files.  
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-23694
>                 URL: https://issues.apache.org/jira/browse/HBASE-23694
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2, RegionProcedureStore
>    Affects Versions: 3.0.0
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>
> With the introduction of the new procedure store to save procs on a table, 
> instead of proc WALs, we added some migration logic to enable a smooth 
> upgrade of clusters under previous versions that used to have proc wal files 
> as the underlying procedure store.
> While running a fresh install with latest master branch version, I had 
> noticed that once the old _WALProcedureStore_ directory is detected, it 
> creates and start a WPS instance 
> [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/region/RegionProcedureStore.java#L311],
>  passing a built-in _ProcedureLoader_ 
> [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/region/RegionProcedureStore.java#L315]
>  to perform the conversion from proc wal files to table based.
> At the end of this conversion, the builtin loader deletes the proc wal dir 
> [here|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/region/RegionProcedureStore.java#L363].
>  This may cause a race condition where a running _WALProcedureStore_ internal 
> thread may be trying to delete the first proc wal file that may had already 
> rolled, then failing with FNFE.
> Even worse is that _WALProcedureStore.syncLoop_ keeps trying indefinitely, 
> flooding Master logs with messages such as below:
> {noformat}
> 2020-01-14 06:13:23,331 INFO  [master/hbase01:16000:becomeActiveMaster] 
> region.RegionProcedureStore: The old WALProcedureStore wal directory 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs exists, migrating...
> 2020-01-14 06:13:23,361 DEBUG [master/hbase01:16000:becomeActiveMaster] 
> wal.WALProcedureStore: Starting WAL Procedure Store lease recovery
> 2020-01-14 06:13:23,453 INFO  [master/hbase01:16000:becomeActiveMaster] 
> wal.WALProcedureStore: Rolled new Procedure Store WAL, id=1
> 2020-01-14 06:13:23,454 DEBUG [master/hbase01:16000:becomeActiveMaster] 
> wal.WALProcedureStore: Lease acquired for flushLogId=1
> 2020-01-14 06:13:23,454 DEBUG [master/hbase01:16000:becomeActiveMaster] 
> wal.WALProcedureStore: No state logs to replay.
> 2020-01-14 06:13:23,454 INFO  [master/hbase01:16000:becomeActiveMaster] 
> region.RegionProcedureStore: The WALProcedureStore max pid is 0, and the max 
> pid of all loaded procedures is -1
> 2020-01-14 06:13:23,455 INFO  [master/hbase01:16000:becomeActiveMaster] 
> region.RegionProcedureStore: Migration of WALProcedureStore finished
> 2020-01-14 06:13:23,455 INFO  [master/hbase01:16000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Recovered RegionProcedureStore lease in 784 msec
> 2020-01-14 06:13:23,471 TRACE [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: no active procedures
> 2020-01-14 06:13:23,516 INFO  [master/hbase01:16000:becomeActiveMaster] 
> procedure2.ProcedureExecutor: Loaded RegionProcedureStore in 61 msec
> 2020-01-14 06:13:23,516 INFO  [master/hbase01:16000:becomeActiveMaster] 
> procedure2.RemoteProcedureDispatcher: Instantiated, coreThreads=128 
> (allowCoreThreadTimeOut=true), queueMaxSize=32, operationDelay=150
> 2020-01-14 06:13:23,628 INFO  [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Rolled new Procedure Store WAL, id=2
> 2020-01-14 06:13:23,634 INFO  [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Remove all state logs with ID less than 1, since no 
> active procedures
> 2020-01-14 06:13:23,634 TRACE [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Removing 
> log=file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
> 2020-01-14 06:13:23,634 INFO  [WALProcedureStoreSyncThread] 
> wal.ProcedureWALFile: Archiving 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
>  to 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/oldWALs/pv2-00000000000000000001.log
> 2020-01-14 06:13:23,634 ERROR [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Unable to remove log: 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
> java.io.FileNotFoundException: File 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
>  does not exist
> ...
> 2020-01-14 06:13:23,635 ERROR [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Unable to remove log: 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
> java.io.FileNotFoundException: File 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
>  does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
> ...
> 2020-01-14 06:13:37,352 ERROR [WALProcedureStoreSyncThread] 
> wal.WALProcedureStore: Unable to remove log: 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
> java.io.FileNotFoundException: File 
> file:/root/hbase-3.0.0-SNAPSHOT/hbase-data/MasterProcWALs/pv2-00000000000000000001.log
>  does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
> {noformat}
> I believe a simple solution in this case is to explicitly call 
> _WALProcedureStore.stop_ method at the end of the migration, which stops 
> _WALProcedureStore_ internal _syncLoop_ thread. 
> Am opening a PR soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23694) After RegionProcedureStore completes migration of WALProcedureStore, still running WALProcedureStore.syncThread keeps trying to delete now inexistent log files.

Reply via email to