[ 
https://issues.apache.org/jira/browse/HDDS-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HDDS-11171.
-------------------------------
    Resolution: Won't Fix

> [DN] Add EC Block Recover Audit Log
> -----------------------------------
>
>                 Key: HDDS-11171
>                 URL: https://issues.apache.org/jira/browse/HDDS-11171
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: Shilun Fan
>            Assignee: Shilun Fan
>            Priority: Major
>              Labels: pull-request-available
>
> In our internal use of Ozone, we heavily utilize EC (Erasure Coding) 
> functionality. When a DN (DataNode) disk fails, it leads to the loss of some 
> EC replica data, which will be reconstructed on other DNs (DataNodes). This 
> reconstruction process may either succeed or fail. To swiftly grasp the 
> outcome of EC block reconstruction, I intend to implement an auditing feature 
> dedicated to EC reconstruction logs. This is crucial, especially in instances 
> of failure, to promptly pinpoint the reasons for reconstruction failures.
> Success log:
> {code:java}
> 2024-07-13 12:06:25,371 | INFO  | DNAudit | user=null | ip=null | 
> op=RECOVER_EC_BLOCK { blockId={blockID={conID: 964637 locID: 
> 113750155051714398 bcsId: 0}, length=4766503, offset=0, token=null, 
> pipeline=Pipeline[ Id: 622e027d-ed89-4b25-9704-17b71ed0cf6b, Nodes: 
> df941469-8358-402a-8600-0d3f508f9cda(bigdata-ozone-m1/xx.xx.xxx.xx) 
> 7c557397-6e8e-413f-ad0c-282634ce84f9(bigdata-ozone-m2/xx.xx.xxx.xx) 
> d8f3179c-7629-48f2-9030-45a89de389ab(bigdata-ozone-m3/xx.xx.xxx.xx) 
> ca5b50fd-4538-430f-85f3-6b2b61ae51d0(bigdata-ozone-m4/xx.xx.xxx.xx) 
> 7c8f10a6-8027-488c-b187-8e4b3afadce3(bigdata-ozone-m5/xx.xx.xxx.xx) 
> 6a0dbf31-d80b-464a-aba8-b964d807e5c3(bigdata-ozone-m6/xx.xx.xxx.xx) 
> 791f3257-bffb-4e46-b0bb-c122192bb0ba(bigdata-ozone-m7/xx.xx.xxx.xx) 
> b3a06978-c73e-4f17-af0b-a890aca2d51c(bigdata-ozone-m8/xx.xx.xxx.xx), 
> excludedSet: , ReplicationConfig: EC{rs-6-3-1024k}, State:CLOSED, leaderId:, 
> CreationTimestamp2024-07-13T12:05:55.014859701+08:00[Asia/Shanghai]], 
> createVersion=0, partNumber=0}} | ret=SUCCESS |
> {code}
> Failure log:
> {code:java}
> 2024-07-13 12:06:25,751 | ERROR | DNAudit | user=null | ip=null | 
> op=RECOVER_EC_BLOCK {blockId={blockID={conID: 964637 locID: 
> 113750155051715549 bcsId: 0}, length=163577856, offset=0, token=null, 
> pipeline=Pipeline[Id: 622e027d-ed89-4b25-9704-17b71ed0cf6b, Nodes: 
> df941469-8358-402a-8600-0d3f508f9cda(bigdata-ozone-m1/xx.xx.xxx.xx) 
> 7c557397-6e8e-413f-ad0c-282634ce84f9(bigdata-ozone-m2/xx.xx.xxx.xx) 
> d8f3179c-7629-48f2-9030-45a89de389ab(bigdata-ozone-m3/xx.xx.xxx.xx) 
> ca5b50fd-4538-430f-85f3-6b2b61ae51d0(bigdata-ozone-m4/xx.xx.xxx.xx) 
> 7c8f10a6-8027-488c-b187-8e4b3afadce3(bigdata-ozone-m5/xx.xx.xxx.xx) 
> 6a0dbf31-d80b-464a-aba8-b964d807e5c3(bigdata-ozone-m6/xx.xx.xxx.xx) 
> 791f3257-bffb-4e46-b0bb-c122192bb0ba(bigdata-ozone-m7/xx.xx.xxx.xx) 
> b3a06978-c73e-4f17-af0b-a890aca2d51c(bigdata-ozone-m8/xx.xx.xxx.xx), 
> excludedSet: , ReplicationConfig: EC{rs-6-3-1024k}, State:CLOSED, leaderId:, 
> CreationTimestamp2024-07-13T12:05:55.014859701+08:00[Asia/Shanghai]], 
> createVersion=0, partNumber=0}} | ret=FAILURE | 
> java.lang.IllegalArgumentException: The chunk list has 26 entries, but the 
> checksum chunks has 27 entries. They should be equal in size.
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:143) 
> at 
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to