[
https://issues.apache.org/jira/browse/HDDS-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shilun Fan resolved HDDS-11171.
-------------------------------
Resolution: Won't Fix
> [DN] Add EC Block Recover Audit Log
> -----------------------------------
>
> Key: HDDS-11171
> URL: https://issues.apache.org/jira/browse/HDDS-11171
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Reporter: Shilun Fan
> Assignee: Shilun Fan
> Priority: Major
> Labels: pull-request-available
>
> In our internal use of Ozone, we heavily utilize EC (Erasure Coding)
> functionality. When a DN (DataNode) disk fails, it leads to the loss of some
> EC replica data, which will be reconstructed on other DNs (DataNodes). This
> reconstruction process may either succeed or fail. To swiftly grasp the
> outcome of EC block reconstruction, I intend to implement an auditing feature
> dedicated to EC reconstruction logs. This is crucial, especially in instances
> of failure, to promptly pinpoint the reasons for reconstruction failures.
> Success log:
> {code:java}
> 2024-07-13 12:06:25,371 | INFO | DNAudit | user=null | ip=null |
> op=RECOVER_EC_BLOCK { blockId={blockID={conID: 964637 locID:
> 113750155051714398 bcsId: 0}, length=4766503, offset=0, token=null,
> pipeline=Pipeline[ Id: 622e027d-ed89-4b25-9704-17b71ed0cf6b, Nodes:
> df941469-8358-402a-8600-0d3f508f9cda(bigdata-ozone-m1/xx.xx.xxx.xx)
> 7c557397-6e8e-413f-ad0c-282634ce84f9(bigdata-ozone-m2/xx.xx.xxx.xx)
> d8f3179c-7629-48f2-9030-45a89de389ab(bigdata-ozone-m3/xx.xx.xxx.xx)
> ca5b50fd-4538-430f-85f3-6b2b61ae51d0(bigdata-ozone-m4/xx.xx.xxx.xx)
> 7c8f10a6-8027-488c-b187-8e4b3afadce3(bigdata-ozone-m5/xx.xx.xxx.xx)
> 6a0dbf31-d80b-464a-aba8-b964d807e5c3(bigdata-ozone-m6/xx.xx.xxx.xx)
> 791f3257-bffb-4e46-b0bb-c122192bb0ba(bigdata-ozone-m7/xx.xx.xxx.xx)
> b3a06978-c73e-4f17-af0b-a890aca2d51c(bigdata-ozone-m8/xx.xx.xxx.xx),
> excludedSet: , ReplicationConfig: EC{rs-6-3-1024k}, State:CLOSED, leaderId:,
> CreationTimestamp2024-07-13T12:05:55.014859701+08:00[Asia/Shanghai]],
> createVersion=0, partNumber=0}} | ret=SUCCESS |
> {code}
> Failure log:
> {code:java}
> 2024-07-13 12:06:25,751 | ERROR | DNAudit | user=null | ip=null |
> op=RECOVER_EC_BLOCK {blockId={blockID={conID: 964637 locID:
> 113750155051715549 bcsId: 0}, length=163577856, offset=0, token=null,
> pipeline=Pipeline[Id: 622e027d-ed89-4b25-9704-17b71ed0cf6b, Nodes:
> df941469-8358-402a-8600-0d3f508f9cda(bigdata-ozone-m1/xx.xx.xxx.xx)
> 7c557397-6e8e-413f-ad0c-282634ce84f9(bigdata-ozone-m2/xx.xx.xxx.xx)
> d8f3179c-7629-48f2-9030-45a89de389ab(bigdata-ozone-m3/xx.xx.xxx.xx)
> ca5b50fd-4538-430f-85f3-6b2b61ae51d0(bigdata-ozone-m4/xx.xx.xxx.xx)
> 7c8f10a6-8027-488c-b187-8e4b3afadce3(bigdata-ozone-m5/xx.xx.xxx.xx)
> 6a0dbf31-d80b-464a-aba8-b964d807e5c3(bigdata-ozone-m6/xx.xx.xxx.xx)
> 791f3257-bffb-4e46-b0bb-c122192bb0ba(bigdata-ozone-m7/xx.xx.xxx.xx)
> b3a06978-c73e-4f17-af0b-a890aca2d51c(bigdata-ozone-m8/xx.xx.xxx.xx),
> excludedSet: , ReplicationConfig: EC{rs-6-3-1024k}, State:CLOSED, leaderId:,
> CreationTimestamp2024-07-13T12:05:55.014859701+08:00[Asia/Shanghai]],
> createVersion=0, partNumber=0}} | ret=FAILURE |
> java.lang.IllegalArgumentException: The chunk list has 26 entries, but the
> checksum chunks has 27 entries. They should be equal in size.
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
> at
> org.apache.hadoop.hdds.scm.storage.ECBlockOutputStream.executePutBlock(ECBlockOutputStream.java:147)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]