[
https://issues.apache.org/jira/browse/HDDS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shashikant Banerjee resolved HDDS-4667.
---------------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
> BlockInputStream should give up read retry if pipeline is not updated
> ---------------------------------------------------------------------
>
> Key: HDDS-4667
> URL: https://issues.apache.org/jira/browse/HDDS-4667
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Marton Elek
> Assignee: Attila Doroszlai
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Found it during the usage of a data generator.
> 1. I accidentally uploaded keys without checksum data.
> 2. With this specific key, the client is moved to an endless loop instead
> of giving up after the first unexpected exceptions:
> {code}
> 2021-01-11 13:01:50,031 INFO storage.BlockInputStream
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
> block conID: 2 locID: 185 bcsId: 0 from pipeline
> PipelineID=206da15d-62f6-4e24-93d1-e2e805fc1376: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
> has no checksums
> 2021-01-11 13:01:50,047 ERROR scm.XceiverClientGrpc
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
> blockID {
> containerID: 2
> localID: 185
> blockCommitSequenceId: 0
> }
> chunkData {
> chunkName: "chunk0"
> offset: 0
> len: 4194304
> checksumData {
> type: CRC32
> bytesPerChecksum: 1048576
> }
> }
> }
> on the pipeline Pipeline[ Id: 7d5ed2da-7453-4113-b766-4100458dcc16, Nodes:
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
> networkLocation: /default-rack, certSerialId: null, persistedOpState:
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE,
> Factor:THREE, State:OPEN, leaderId:,
> CreationTimestamp2021-01-11T12:01:50.032Z].
> 2021-01-11 13:01:50,047 INFO storage.BlockInputStream
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
> block conID: 2 locID: 185 bcsId: 0 from pipeline
> PipelineID=7d5ed2da-7453-4113-b766-4100458dcc16: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
> has no checksums
> 2021-01-11 13:01:50,062 ERROR scm.XceiverClientGrpc
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
> blockID {
> containerID: 2
> localID: 185
> blockCommitSequenceId: 0
> }
> chunkData {
> chunkName: "chunk0"
> offset: 0
> len: 4194304
> checksumData {
> type: CRC32
> bytesPerChecksum: 1048576
> }
> }
> }
> on the pipeline Pipeline[ Id: 3a4b5032-6b2f-4297-8c4b-89d715175bb1, Nodes:
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
> networkLocation: /default-rack, certSerialId: null, persistedOpState:
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE,
> Factor:THREE, State:OPEN, leaderId:,
> CreationTimestamp2021-01-11T12:01:50.048Z].
> {code}
> Please note that the two attempt happens in the same milliseconds.
> The problematic part seems to be in the BlockInputStream:
> {code}
> try {
> numBytesRead = current.read(b, off, numBytesToRead);
> } catch (IOException e) {
> handleReadError(e);
> continue;
> }
> {code}
> In case of system exceptions we should "break" from the loop instead of
> "continue".
> (Normally it's not possible in a production cluster as the data is created
> with a bad client. But it has security implication: a malicious user can
> create similar keys which makes a DoS attack: all the clients will retry
> without sleep...)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]