[ 
https://issues.apache.org/jira/browse/HDDS-1228?focusedWorklogId=325325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-325325
 ]

ASF GitHub Bot logged work on HDDS-1228:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Oct/19 20:52
            Start Date: 08/Oct/19 20:52
    Worklog Time Spent: 10m 
      Work Description: adoroszlai commented on pull request #1622: HDDS-1228. 
Chunk Scanner Checkpoints
URL: https://github.com/apache/hadoop/pull/1622
 
 
   ## What changes were proposed in this pull request?
   
   Save timestamp of last successful data scan for each container (in the 
`.container` file).  After a datanode restart, resume data scanning with the 
container that was least recently scanned.
   
   Newly closed containers have no timestamp and are thus scanned first during 
the next iteration.  This will be changed in 
[HDDS-1369](https://issues.apache.org/jira/browse/HDDS-1369), which proposes to 
scan newly closed containers immediately.
   
   https://issues.apache.org/jira/browse/HDDS-1228
   
   ## How was this patch tested?
   
   Created and closed containers.  Restarted datanode while scanning was in 
progress.  Verified that after the restart, scanner resumed from the container 
where it was interrupted.
   
   ```
   datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
   datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:148 - Scanning 
container 1, last scanned never
   datanode_1  | 2019-10-08 19:37:07 DEBUG ContainerDataScanner:155 - Completed 
scan of container 1 at 2019-10-08T19:37:07.570Z
   datanode_1  | 2019-10-08 19:37:07 INFO  ContainerDataScanner:122 - Completed 
an iteration of container data scrubber in 0 minutes. Number of iterations 
(since the data-node restart) : 1, Number of containers scanned in this 
iteration : 1, Number of unhealthy containers found in this iteration : 0
   datanode_1  | 2019-10-08 19:37:17 DEBUG ContainerDataScanner:148 - Scanning 
container 2, last scanned never
   datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed 
scan of container 2 at 2019-10-08T19:38:57.402Z
   datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning 
container 1, last scanned at 2019-10-08T19:37:07.570Z
   datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:155 - Completed 
scan of container 1 at 2019-10-08T19:38:57.443Z
   datanode_1  | 2019-10-08 19:38:57 INFO  ContainerDataScanner:122 - Completed 
an iteration of container data scrubber in 1 minutes. Number of iterations 
(since the data-node restart) : 2, Number of containers scanned in this 
iteration : 2, Number of unhealthy containers found in this iteration : 0
   datanode_1  | 2019-10-08 19:38:57 DEBUG ContainerDataScanner:148 - Scanning 
container 3, last scanned never
   datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed 
scan of container 3 at 2019-10-08T19:39:02.402Z
   datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning 
container 4, last scanned never
   datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:155 - Completed 
scan of container 4 at 2019-10-08T19:39:02.430Z
   datanode_1  | 2019-10-08 19:39:02 DEBUG ContainerDataScanner:148 - Scanning 
container 5, last scanned never
   datanode_1  | 2019-10-08 19:39:11 ERROR HddsDatanodeService:75 - RECEIVED 
SIGNAL 15: SIGTERM
   datanode_1  | STARTUP_MSG: Starting HddsDatanodeService
   datanode_1  | 2019-10-08 19:39:22 DEBUG ContainerDataScanner:148 - Scanning 
container 5, last scanned never
   datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:155 - Completed 
scan of container 5 at 2019-10-08T19:40:18.268Z
   datanode_1  | 2019-10-08 19:40:18 DEBUG ContainerDataScanner:148 - Scanning 
container 6, last scanned never
   datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:155 - Completed 
scan of container 6 at 2019-10-08T19:40:31.735Z
   datanode_1  | 2019-10-08 19:40:31 DEBUG ContainerDataScanner:148 - Scanning 
container 2, last scanned at 2019-10-08T19:38:57.402Z
   datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed 
scan of container 2 at 2019-10-08T19:42:12.128Z
   datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning 
container 1, last scanned at 2019-10-08T19:38:57.443Z
   datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:155 - Completed 
scan of container 1 at 2019-10-08T19:42:12.140Z
   datanode_1  | 2019-10-08 19:42:12 DEBUG ContainerDataScanner:148 - Scanning 
container 3, last scanned at 2019-10-08T19:39:02.402Z
   datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed 
scan of container 3 at 2019-10-08T19:42:16.629Z
   datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:148 - Scanning 
container 4, last scanned at 2019-10-08T19:39:02.430Z
   datanode_1  | 2019-10-08 19:42:16 DEBUG ContainerDataScanner:155 - Completed 
scan of container 4 at 2019-10-08T19:42:16.669Z
   datanode_1  | 2019-10-08 19:42:16 INFO  ContainerDataScanner:122 - Completed 
an iteration of container data scrubber in 2 minutes. Number of iterations 
(since the data-node restart) : 1, Number of containers scanned in this 
iteration : 6, Number of unhealthy containers found in this iteration : 0
   ```
   
   Also tested upgrade from Ozone 0.4.0.  (Downgrade does not work, see 
[HDDS-2268](https://issues.apache.org/jira/browse/HDDS-2268).)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 325325)
    Remaining Estimate: 0h
            Time Spent: 10m

> Chunk Scanner Checkpoints
> -------------------------
>
>                 Key: HDDS-1228
>                 URL: https://issues.apache.org/jira/browse/HDDS-1228
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Datanode
>            Reporter: Supratim Deka
>            Assignee: Attila Doroszlai
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 0.5.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Checkpoint the progress of the chunk verification scanner.
> Save the checkpoint persistently to support scanner resume from checkpoint - 
> after a datanode restart.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to