[
https://issues.apache.org/jira/browse/HDDS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Gui updated HDDS-6295:
---------------------------
Description:
We hit a bug when try to write a small key of length 321 (/etc/hosts in my box).
{code:java}
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
at
org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:543)
at
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
at
org.apache.hadoop.ozone.shell.keys.PutKeyHandler.execute(PutKeyHandler.java:107)
at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
at
org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$17(OzoneShell.java:55)
at
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
at org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47)
{code}
commands that we run:
{code:java}
./bin/ozone sh bucket create vol1/bucket1 --layout=FILE_SYSTEM_OPTIMIZED
--replication=rs-10-4-1024k --type EC
./bin/ozone sh key put /vol1/bucket1/hosts /etc/hosts{code}
And we tested with more cases which will cause the same problem such as:
{code:java}
dd if=/dev/zero of=dd.10.1M bs=1K count=10241
./bin/ozone sh key put /vol1/bucket1/dd.10.1M dd.10.1M{code}
And also some succeeded examples:
- key of aligned size
{code:java}
dd if=/dev/zero of=dd.10M bs=1K count=10240
./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
- bucket with policy rs-3-2-1024k
{code:java}
./bin/ozone sh bucket create vol1/bucket2 --layout=FILE_SYSTEM_OPTIMIZED
--replication=rs-3-2-1024k --type EC
./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
As I digged into the code, I found a potential int overflow in
ECKeyoutputStream:
{code:java}
private void handleOutputStreamWrite(int currIdx, long len,
boolean isFullCell, boolean isParity) {
BlockOutputStreamEntry current =
blockOutputStreamEntryPool.getCurrentStreamEntry();
int writeLengthToCurrStream =
Math.min((int) len, (int) current.getRemaining());
<-- int overflow happens
currentBlockGroupLen += isParity ? 0 : writeLengthToCurrStream;
if (isFullCell) {
ByteBuffer bytesToWrite = isParity ?
ecChunkBufferCache.getParityBuffers()[currIdx - numDataBlks] :
ecChunkBufferCache.getDataBuffers()[currIdx];
try {
// Since it's a fullcell, let's write all content from buffer.
writeToOutputStream(current, len, bytesToWrite.array(),
bytesToWrite.limit(), 0, isParity);
} catch (Exception e) {
markStreamAsFailed(e);
}
}
} {code}
It is because that BlockOutputStreamEntry#getRemaing() ought to the the
remaining bytes in the current stream entry, but getLength() is overrided for
ECBlockOutputStreamEntry.
{code:java}
// BlockOutputStreamEntry
long getRemaining() {
return getLength() - getCurrentPosition();
} {code}
{code:java}
// ECBlockOutputStreamEntry
ECBlockOutputStreamEntry(BlockID blockID, String key,
XceiverClientFactory xceiverClientManager, Pipeline pipeline, long length,
BufferPool bufferPool, Token<OzoneBlockTokenIdentifier> token,
OzoneClientConfig config) {
super(blockID, key, xceiverClientManager, pipeline, length, bufferPool,
token, config);
assertInstanceOf(
pipeline.getReplicationConfig(), ECReplicationConfig.class);
this.replicationConfig =
(ECReplicationConfig) pipeline.getReplicationConfig();
this.length = replicationConfig.getData() * length;
<-- a new length defined for EC
}
@Override public long getLength() { return length; }
{code}
> EC: Fix unaligned stripe write failure due to wrong length defined.
> -------------------------------------------------------------------
>
> Key: HDDS-6295
> URL: https://issues.apache.org/jira/browse/HDDS-6295
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Mark Gui
> Assignee: Mark Gui
> Priority: Major
>
> We hit a bug when try to write a small key of length 321 (/etc/hosts in my
> box).
>
> {code:java}
> java.lang.IllegalArgumentException
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
> at
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:543)
> at
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
> at
> org.apache.hadoop.ozone.shell.keys.PutKeyHandler.execute(PutKeyHandler.java:107)
> at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
> at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
> at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
> at picocli.CommandLine.access$1300(CommandLine.java:145)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
> at
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
> at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
> at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
> at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
> at
> org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$17(OzoneShell.java:55)
> at
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
> at
> org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
> at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
> at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47)
> {code}
>
> commands that we run:
>
> {code:java}
> ./bin/ozone sh bucket create vol1/bucket1 --layout=FILE_SYSTEM_OPTIMIZED
> --replication=rs-10-4-1024k --type EC
> ./bin/ozone sh key put /vol1/bucket1/hosts /etc/hosts{code}
>
> And we tested with more cases which will cause the same problem such as:
>
> {code:java}
> dd if=/dev/zero of=dd.10.1M bs=1K count=10241
> ./bin/ozone sh key put /vol1/bucket1/dd.10.1M dd.10.1M{code}
>
> And also some succeeded examples:
> - key of aligned size
>
> {code:java}
> dd if=/dev/zero of=dd.10M bs=1K count=10240
> ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
> - bucket with policy rs-3-2-1024k
>
>
> {code:java}
> ./bin/ozone sh bucket create vol1/bucket2 --layout=FILE_SYSTEM_OPTIMIZED
> --replication=rs-3-2-1024k --type EC
> ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
>
>
> As I digged into the code, I found a potential int overflow in
> ECKeyoutputStream:
> {code:java}
> private void handleOutputStreamWrite(int currIdx, long len,
> boolean isFullCell, boolean isParity) {
> BlockOutputStreamEntry current =
> blockOutputStreamEntryPool.getCurrentStreamEntry();
> int writeLengthToCurrStream =
> Math.min((int) len, (int) current.getRemaining());
> <-- int overflow happens
> currentBlockGroupLen += isParity ? 0 : writeLengthToCurrStream;
> if (isFullCell) {
> ByteBuffer bytesToWrite = isParity ?
> ecChunkBufferCache.getParityBuffers()[currIdx - numDataBlks] :
> ecChunkBufferCache.getDataBuffers()[currIdx];
> try {
> // Since it's a fullcell, let's write all content from buffer.
> writeToOutputStream(current, len, bytesToWrite.array(),
> bytesToWrite.limit(), 0, isParity);
> } catch (Exception e) {
> markStreamAsFailed(e);
> }
> }
> } {code}
> It is because that BlockOutputStreamEntry#getRemaing() ought to the the
> remaining bytes in the current stream entry, but getLength() is overrided for
> ECBlockOutputStreamEntry.
> {code:java}
> // BlockOutputStreamEntry
> long getRemaining() {
> return getLength() - getCurrentPosition();
> } {code}
> {code:java}
> // ECBlockOutputStreamEntry
> ECBlockOutputStreamEntry(BlockID blockID, String key,
> XceiverClientFactory xceiverClientManager, Pipeline pipeline, long
> length,
> BufferPool bufferPool, Token<OzoneBlockTokenIdentifier> token,
> OzoneClientConfig config) {
> super(blockID, key, xceiverClientManager, pipeline, length, bufferPool,
> token, config);
> assertInstanceOf(
> pipeline.getReplicationConfig(), ECReplicationConfig.class);
> this.replicationConfig =
> (ECReplicationConfig) pipeline.getReplicationConfig();
> this.length = replicationConfig.getData() * length;
> <-- a new length defined for EC
> }
> @Override public long getLength() { return length; }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]