[ 
https://issues.apache.org/jira/browse/HDDS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-4667.
---------------------------------------
    Fix Version/s: 1.1.0
       Resolution: Fixed

> BlockInputStream should give up read retry if pipeline is not updated
> ---------------------------------------------------------------------
>
>                 Key: HDDS-4667
>                 URL: https://issues.apache.org/jira/browse/HDDS-4667
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Marton Elek
>            Assignee: Attila Doroszlai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> Found it during the usage of a data generator.
>  1. I accidentally uploaded keys without checksum data.
>   2. With this specific key, the client is moved to an endless loop instead 
> of giving up after the first unexpected exceptions:
> {code}
> 2021-01-11 13:01:50,031 INFO  storage.BlockInputStream 
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for 
> block conID: 2 locID: 185 bcsId: 0 from pipeline 
> PipelineID=206da15d-62f6-4e24-93d1-e2e805fc1376: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData 
> has no checksums
> 2021-01-11 13:01:50,047 ERROR scm.XceiverClientGrpc 
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute 
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
>   blockID {
>     containerID: 2
>     localID: 185
>     blockCommitSequenceId: 0
>   }
>   chunkData {
>     chunkName: "chunk0"
>     offset: 0
>     len: 4194304
>     checksumData {
>       type: CRC32
>       bytesPerChecksum: 1048576
>     }
>   }
> }
>  on the pipeline Pipeline[ Id: 7d5ed2da-7453-4113-b766-4100458dcc16, Nodes: 
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null, persistedOpState: 
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, 
> Factor:THREE, State:OPEN, leaderId:, 
> CreationTimestamp2021-01-11T12:01:50.032Z].
> 2021-01-11 13:01:50,047 INFO  storage.BlockInputStream 
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for 
> block conID: 2 locID: 185 bcsId: 0 from pipeline 
> PipelineID=7d5ed2da-7453-4113-b766-4100458dcc16: Unexpected OzoneException: 
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData 
> has no checksums
> 2021-01-11 13:01:50,062 ERROR scm.XceiverClientGrpc 
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute 
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
>   blockID {
>     containerID: 2
>     localID: 185
>     blockCommitSequenceId: 0
>   }
>   chunkData {
>     chunkName: "chunk0"
>     offset: 0
>     len: 4194304
>     checksumData {
>       type: CRC32
>       bytesPerChecksum: 1048576
>     }
>   }
> }
>  on the pipeline Pipeline[ Id: 3a4b5032-6b2f-4297-8c4b-89d715175bb1, Nodes: 
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost, 
> networkLocation: /default-rack, certSerialId: null, persistedOpState: 
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, 
> Factor:THREE, State:OPEN, leaderId:, 
> CreationTimestamp2021-01-11T12:01:50.048Z].
> {code}
> Please note that the two attempt happens in the same milliseconds.
> The problematic part seems to be in the BlockInputStream:
> {code}
>       try {
>         numBytesRead = current.read(b, off, numBytesToRead);
>       } catch (IOException e) {
>         handleReadError(e);
>         continue;
>       }
> {code}
> In case of system exceptions we should "break" from the loop instead of 
> "continue".
> (Normally it's not possible in a production cluster as the data is created 
> with a bad client. But it has security implication: a malicious user can 
> create similar keys which makes a DoS attack: all the clients will retry 
> without sleep...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to