[
https://issues.apache.org/jira/browse/HDDS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marton Elek updated HDDS-4667:
------------------------------
Description:
Found it during the usage of a data generator.
1. I accidentally uploaded keys without checksum data.
2. With this specific key, the client is moved to an endless loop instead of
giving up after the first unexpected exceptions:
{code}
2021-01-11 13:01:50,031 INFO storage.BlockInputStream
(BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
block conID: 2 locID: 185 bcsId: 0 from pipeline
PipelineID=206da15d-62f6-4e24-93d1-e2e805fc1376: Unexpected OzoneException:
org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
has no checksums
2021-01-11 13:01:50,047 ERROR scm.XceiverClientGrpc
(XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute command
cmdType: ReadChunk
traceID: ""
containerID: 2
datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
readChunk {
blockID {
containerID: 2
localID: 185
blockCommitSequenceId: 0
}
chunkData {
chunkName: "chunk0"
offset: 0
len: 4194304
checksumData {
type: CRC32
bytesPerChecksum: 1048576
}
}
}
on the pipeline Pipeline[ Id: 7d5ed2da-7453-4113-b766-4100458dcc16, Nodes:
2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
networkLocation: /default-rack, certSerialId: null, persistedOpState:
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, Factor:THREE,
State:OPEN, leaderId:, CreationTimestamp2021-01-11T12:01:50.032Z].
2021-01-11 13:01:50,047 INFO storage.BlockInputStream
(BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
block conID: 2 locID: 185 bcsId: 0 from pipeline
PipelineID=7d5ed2da-7453-4113-b766-4100458dcc16: Unexpected OzoneException:
org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
has no checksums
2021-01-11 13:01:50,062 ERROR scm.XceiverClientGrpc
(XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute command
cmdType: ReadChunk
traceID: ""
containerID: 2
datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
readChunk {
blockID {
containerID: 2
localID: 185
blockCommitSequenceId: 0
}
chunkData {
chunkName: "chunk0"
offset: 0
len: 4194304
checksumData {
type: CRC32
bytesPerChecksum: 1048576
}
}
}
on the pipeline Pipeline[ Id: 3a4b5032-6b2f-4297-8c4b-89d715175bb1, Nodes:
2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
networkLocation: /default-rack, certSerialId: null, persistedOpState:
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, Factor:THREE,
State:OPEN, leaderId:, CreationTimestamp2021-01-11T12:01:50.048Z].
{code}
Please note that the two attempt happens in the same milliseconds.
The problematic part seems to be in the BlockInputStream:
{code}
try {
numBytesRead = current.read(b, off, numBytesToRead);
} catch (IOException e) {
handleReadError(e);
continue;
}
{code}
In case of system exceptions we should "break" from the loop instead of
"continue".
(Normally it's not possible in a production cluster as the data is created with
a bad client. But it has security implication: a malicious user can create
similar keys which makes a DoS attack: all the clients will retry without
sleep...)
was:
Found it during the usage of a data generator.
1. I accidentally uploaded keys without checksum data.
2. In this specific key, the client is moved to an endless loop instead of
giving up after the first unexpected exceptions:
{code}
2021-01-11 13:01:50,031 INFO storage.BlockInputStream
(BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
block conID: 2 locID: 185 bcsId: 0 from pipeline
PipelineID=206da15d-62f6-4e24-93d1-e2e805fc1376: Unexpected OzoneException:
org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
has no checksums
2021-01-11 13:01:50,047 ERROR scm.XceiverClientGrpc
(XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute command
cmdType: ReadChunk
traceID: ""
containerID: 2
datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
readChunk {
blockID {
containerID: 2
localID: 185
blockCommitSequenceId: 0
}
chunkData {
chunkName: "chunk0"
offset: 0
len: 4194304
checksumData {
type: CRC32
bytesPerChecksum: 1048576
}
}
}
on the pipeline Pipeline[ Id: 7d5ed2da-7453-4113-b766-4100458dcc16, Nodes:
2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
networkLocation: /default-rack, certSerialId: null, persistedOpState:
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, Factor:THREE,
State:OPEN, leaderId:, CreationTimestamp2021-01-11T12:01:50.032Z].
2021-01-11 13:01:50,047 INFO storage.BlockInputStream
(BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
block conID: 2 locID: 185 bcsId: 0 from pipeline
PipelineID=7d5ed2da-7453-4113-b766-4100458dcc16: Unexpected OzoneException:
org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
has no checksums
2021-01-11 13:01:50,062 ERROR scm.XceiverClientGrpc
(XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute command
cmdType: ReadChunk
traceID: ""
containerID: 2
datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
readChunk {
blockID {
containerID: 2
localID: 185
blockCommitSequenceId: 0
}
chunkData {
chunkName: "chunk0"
offset: 0
len: 4194304
checksumData {
type: CRC32
bytesPerChecksum: 1048576
}
}
}
on the pipeline Pipeline[ Id: 3a4b5032-6b2f-4297-8c4b-89d715175bb1, Nodes:
2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
networkLocation: /default-rack, certSerialId: null, persistedOpState:
IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE, Factor:THREE,
State:OPEN, leaderId:, CreationTimestamp2021-01-11T12:01:50.048Z].
{code}
Please note that the two attempt happens in the same milliseconds.
The problematic part seems to be in the BlockInputStream:
{code}
try {
numBytesRead = current.read(b, off, numBytesToRead);
} catch (IOException e) {
handleReadError(e);
continue;
}
{code}
In case of system exceptions we should "break" from the loop instead of
"continue".
(Normally it's not possible in a production cluster as the data is created with
a bad client. But it has security implication: a malicious user can create
similar keys which makes a DoS attack: all the clients will retry without
sleep...)
> XCeiverClientGrpc should give up if unexpected exception is thrown from read
> path
> ---------------------------------------------------------------------------------
>
> Key: HDDS-4667
> URL: https://issues.apache.org/jira/browse/HDDS-4667
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Client
> Reporter: Marton Elek
> Priority: Major
>
> Found it during the usage of a data generator.
> 1. I accidentally uploaded keys without checksum data.
> 2. With this specific key, the client is moved to an endless loop instead
> of giving up after the first unexpected exceptions:
> {code}
> 2021-01-11 13:01:50,031 INFO storage.BlockInputStream
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
> block conID: 2 locID: 185 bcsId: 0 from pipeline
> PipelineID=206da15d-62f6-4e24-93d1-e2e805fc1376: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
> has no checksums
> 2021-01-11 13:01:50,047 ERROR scm.XceiverClientGrpc
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
> blockID {
> containerID: 2
> localID: 185
> blockCommitSequenceId: 0
> }
> chunkData {
> chunkName: "chunk0"
> offset: 0
> len: 4194304
> checksumData {
> type: CRC32
> bytesPerChecksum: 1048576
> }
> }
> }
> on the pipeline Pipeline[ Id: 7d5ed2da-7453-4113-b766-4100458dcc16, Nodes:
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
> networkLocation: /default-rack, certSerialId: null, persistedOpState:
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE,
> Factor:THREE, State:OPEN, leaderId:,
> CreationTimestamp2021-01-11T12:01:50.032Z].
> 2021-01-11 13:01:50,047 INFO storage.BlockInputStream
> (BlockInputStream.java:refreshPipeline(166)) - Unable to read information for
> block conID: 2 locID: 185 bcsId: 0 from pipeline
> PipelineID=7d5ed2da-7453-4113-b766-4100458dcc16: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Original checksumData
> has no checksums
> 2021-01-11 13:01:50,062 ERROR scm.XceiverClientGrpc
> (XceiverClientGrpc.java:sendCommandWithRetry(408)) - Failed to execute
> command cmdType: ReadChunk
> traceID: ""
> containerID: 2
> datanodeUuid: "2c124e08-e8a5-4493-a41e-84797984e6a6"
> readChunk {
> blockID {
> containerID: 2
> localID: 185
> blockCommitSequenceId: 0
> }
> chunkData {
> chunkName: "chunk0"
> offset: 0
> len: 4194304
> checksumData {
> type: CRC32
> bytesPerChecksum: 1048576
> }
> }
> }
> on the pipeline Pipeline[ Id: 3a4b5032-6b2f-4297-8c4b-89d715175bb1, Nodes:
> 2c124e08-e8a5-4493-a41e-84797984e6a6{ip: 127.0.0.1, host: localhost,
> networkLocation: /default-rack, certSerialId: null, persistedOpState:
> IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, Type:STAND_ALONE,
> Factor:THREE, State:OPEN, leaderId:,
> CreationTimestamp2021-01-11T12:01:50.048Z].
> {code}
> Please note that the two attempt happens in the same milliseconds.
> The problematic part seems to be in the BlockInputStream:
> {code}
> try {
> numBytesRead = current.read(b, off, numBytesToRead);
> } catch (IOException e) {
> handleReadError(e);
> continue;
> }
> {code}
> In case of system exceptions we should "break" from the loop instead of
> "continue".
> (Normally it's not possible in a production cluster as the data is created
> with a bad client. But it has security implication: a malicious user can
> create similar keys which makes a DoS attack: all the clients will retry
> without sleep...)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]