[ 
https://issues.apache.org/jira/browse/HDDS-15301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-15301:
------------------------------------
    Fix Version/s: 2.2.0
       Resolution: Fixed
           Status: Resolved  (was: Patch Available)

> Malformed PutBlock request can mark container UNHEALTHY
> -------------------------------------------------------
>
>                 Key: HDDS-15301
>                 URL: https://issues.apache.org/jira/browse/HDDS-15301
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client, Ozone Datanode
>            Reporter: Chu Cheng Li
>            Assignee: Chu Cheng Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.2.0
>
>
> h2. Summary
> A malformed client {{PutBlock}} request can cause the datanode to mark the 
> target container {{{}UNHEALTHY{}}}. The request should be rejected as a 
> client-side malformed request, but currently it is mapped to 
> {{{}IO_EXCEPTION{}}}, which {{HddsDispatcher}} treats as a container write 
> failure.
> This means a bad/misbehaving client can poison the active container and close 
> the pipeline.
> h2. Environment
>  * Ozone version: {{2.2.0-SNAPSHOT}}
>  * Cluster: {{MiniOzoneCluster}}
>  * Pipeline: single-node Ratis pipeline
>  * Client: custom Rust Ozone client reproducing Java client incremental chunk 
> list semantics
> h2. Repro
> Send an incremental {{PutBlock}} request where {{BlockData.size}} does not 
> equal the sum of chunks included in the request.
> Example from the failing request:
>  
> {code:java}
> putBlock {
>   blockData {
>     blockID { 
>       containerID: 2 
>       localID: 117883640217600002 
>       blockCommitSequenceId: 0 
>     }
>     metadata { key: "incremental" }
>     chunks {
>       chunkName: "117883640217600002_chunk_16"
>       offset: 16777216
>       len: 1048576
>       metadata { key: "full" }
>       checksumData { type: NONE bytesPerChecksum: 0 }
>     }
>     size: 17825792
>   }
>   eof: false
> } {code}
>  
> The request includes only one {{1 MiB}} chunk, but {{size}} is {{{}17 MiB{}}}.
> h2. Actual Behavior
> The datanode rejects the protobuf with a {{{}CodecException{}}}:
> {code:java}
> Caused by: org.apache.hadoop.hdds.utils.db.CodecException:
> Size mismatch: size (=17825792) != sum of chunks (=1048576){code}
> That exception is caught in {{KeyValueHandler.handlePutBlock}} as an 
> {{IOException}} and returned as {{{}IO_EXCEPTION{}}}:
> {code:java}
>  Operation: PutBlock, Message: Put Key failed, Result: IO_EXCEPTION{code}
> {{}}
> Then {{HddsDispatcher}} treats the failed write as a container write failure:
> {code:java}
>  Marked container UNHEALTHY from OPEN: KeyValueContainerData #2{code}
> After that, subsequent writes fail with: 
> {code:java}
>  Container 2 in UNHEALTHY state{code}
> {{}}
> SCM closes the pipeline, and clients may later see retry/failover noise such 
> as:
> {code:java}
> not leader; suggested_leader_present=false
> exhausted retry-window resend attempts {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to