[ 
https://issues.apache.org/jira/browse/HDDS-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Gui updated HDDS-6295:
---------------------------
    Description: 
We hit a bug when try to write a small key of length 321 (/etc/hosts in my box).

 
{code:java}
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
        at 
org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:543)
        at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
        at 
org.apache.hadoop.ozone.shell.keys.PutKeyHandler.execute(PutKeyHandler.java:107)
        at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
        at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
        at 
org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$17(OzoneShell.java:55)
        at 
org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
        at org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
        at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47) 
{code}
 

commands that we run:

 
{code:java}
./bin/ozone sh bucket create vol1/bucket1 --layout=FILE_SYSTEM_OPTIMIZED 
--replication=rs-10-4-1024k --type EC
./bin/ozone sh key put /vol1/bucket1/hosts /etc/hosts{code}
 

And we tested with more cases which will cause the same problem such as:

 
{code:java}
dd if=/dev/zero of=dd.10.1M bs=1K count=10241
./bin/ozone sh key put /vol1/bucket1/dd.10.1M dd.10.1M{code}
 

And also some succeeded examples:

- key of aligned size

 
{code:java}
dd if=/dev/zero of=dd.10M bs=1K count=10240
./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
- bucket with policy rs-3-2-1024k

 

 
{code:java}
./bin/ozone sh bucket create vol1/bucket2 --layout=FILE_SYSTEM_OPTIMIZED 
--replication=rs-3-2-1024k --type EC 
./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
 

 

As I digged into the code, I found a potential int overflow in 
ECKeyoutputStream:
{code:java}
private void handleOutputStreamWrite(int currIdx, long len,
    boolean isFullCell, boolean isParity) {

  BlockOutputStreamEntry current =
      blockOutputStreamEntryPool.getCurrentStreamEntry();
  int writeLengthToCurrStream =
      Math.min((int) len, (int) current.getRemaining());                      
<-- int overflow happens
  currentBlockGroupLen += isParity ? 0 : writeLengthToCurrStream;

  if (isFullCell) {
    ByteBuffer bytesToWrite = isParity ?
        ecChunkBufferCache.getParityBuffers()[currIdx - numDataBlks] :
        ecChunkBufferCache.getDataBuffers()[currIdx];
    try {
      // Since it's a fullcell, let's write all content from buffer.
      writeToOutputStream(current, len, bytesToWrite.array(),
          bytesToWrite.limit(), 0, isParity);
    } catch (Exception e) {
      markStreamAsFailed(e);
    }
  }
} {code}
It is because that BlockOutputStreamEntry#getRemaing() ought to the the 
remaining bytes in the current stream entry, but getLength() is overrided for 

ECBlockOutputStreamEntry.
{code:java}
// BlockOutputStreamEntry
long getRemaining() {
  return getLength() - getCurrentPosition();
} {code}
{code:java}
// ECBlockOutputStreamEntry
  ECBlockOutputStreamEntry(BlockID blockID, String key,
      XceiverClientFactory xceiverClientManager, Pipeline pipeline, long length,
      BufferPool bufferPool, Token<OzoneBlockTokenIdentifier> token,
      OzoneClientConfig config) {
    super(blockID, key, xceiverClientManager, pipeline, length, bufferPool,
        token, config);
    assertInstanceOf(
        pipeline.getReplicationConfig(), ECReplicationConfig.class);
    this.replicationConfig =
        (ECReplicationConfig) pipeline.getReplicationConfig();
    this.length = replicationConfig.getData() * length;                         
<-- a new length defined for EC
  }


  @Override public long getLength() { return length; }
{code}



> EC: Fix unaligned stripe write failure due to wrong length defined.
> -------------------------------------------------------------------
>
>                 Key: HDDS-6295
>                 URL: https://issues.apache.org/jira/browse/HDDS-6295
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Mark Gui
>            Assignee: Mark Gui
>            Priority: Major
>
> We hit a bug when try to write a small key of length 321 (/etc/hosts in my 
> box).
>  
> {code:java}
> java.lang.IllegalArgumentException
>         at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:130)
>         at 
> org.apache.hadoop.ozone.client.io.ECKeyOutputStream.close(ECKeyOutputStream.java:543)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
>         at 
> org.apache.hadoop.ozone.shell.keys.PutKeyHandler.execute(PutKeyHandler.java:107)
>         at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:98)
>         at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:44)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
>         at picocli.CommandLine.access$1300(CommandLine.java:145)
>         at 
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
>         at 
> picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2172)
>         at picocli.CommandLine.parseWithHandlers(CommandLine.java:2550)
>         at picocli.CommandLine.parseWithHandler(CommandLine.java:2485)
>         at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
>         at 
> org.apache.hadoop.ozone.shell.OzoneShell.lambda$execute$17(OzoneShell.java:55)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:159)
>         at 
> org.apache.hadoop.ozone.shell.OzoneShell.execute(OzoneShell.java:53)
>         at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
>         at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:47) 
> {code}
>  
> commands that we run:
>  
> {code:java}
> ./bin/ozone sh bucket create vol1/bucket1 --layout=FILE_SYSTEM_OPTIMIZED 
> --replication=rs-10-4-1024k --type EC
> ./bin/ozone sh key put /vol1/bucket1/hosts /etc/hosts{code}
>  
> And we tested with more cases which will cause the same problem such as:
>  
> {code:java}
> dd if=/dev/zero of=dd.10.1M bs=1K count=10241
> ./bin/ozone sh key put /vol1/bucket1/dd.10.1M dd.10.1M{code}
>  
> And also some succeeded examples:
> - key of aligned size
>  
> {code:java}
> dd if=/dev/zero of=dd.10M bs=1K count=10240
> ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
> - bucket with policy rs-3-2-1024k
>  
>  
> {code:java}
> ./bin/ozone sh bucket create vol1/bucket2 --layout=FILE_SYSTEM_OPTIMIZED 
> --replication=rs-3-2-1024k --type EC 
> ./bin/ozone sh key put /vol1/bucket1/dd.10M ../dd.10M {code}
>  
>  
> As I digged into the code, I found a potential int overflow in 
> ECKeyoutputStream:
> {code:java}
> private void handleOutputStreamWrite(int currIdx, long len,
>     boolean isFullCell, boolean isParity) {
>   BlockOutputStreamEntry current =
>       blockOutputStreamEntryPool.getCurrentStreamEntry();
>   int writeLengthToCurrStream =
>       Math.min((int) len, (int) current.getRemaining());                      
> <-- int overflow happens
>   currentBlockGroupLen += isParity ? 0 : writeLengthToCurrStream;
>   if (isFullCell) {
>     ByteBuffer bytesToWrite = isParity ?
>         ecChunkBufferCache.getParityBuffers()[currIdx - numDataBlks] :
>         ecChunkBufferCache.getDataBuffers()[currIdx];
>     try {
>       // Since it's a fullcell, let's write all content from buffer.
>       writeToOutputStream(current, len, bytesToWrite.array(),
>           bytesToWrite.limit(), 0, isParity);
>     } catch (Exception e) {
>       markStreamAsFailed(e);
>     }
>   }
> } {code}
> It is because that BlockOutputStreamEntry#getRemaing() ought to the the 
> remaining bytes in the current stream entry, but getLength() is overrided for 
> ECBlockOutputStreamEntry.
> {code:java}
> // BlockOutputStreamEntry
> long getRemaining() {
>   return getLength() - getCurrentPosition();
> } {code}
> {code:java}
> // ECBlockOutputStreamEntry
>   ECBlockOutputStreamEntry(BlockID blockID, String key,
>       XceiverClientFactory xceiverClientManager, Pipeline pipeline, long 
> length,
>       BufferPool bufferPool, Token<OzoneBlockTokenIdentifier> token,
>       OzoneClientConfig config) {
>     super(blockID, key, xceiverClientManager, pipeline, length, bufferPool,
>         token, config);
>     assertInstanceOf(
>         pipeline.getReplicationConfig(), ECReplicationConfig.class);
>     this.replicationConfig =
>         (ECReplicationConfig) pipeline.getReplicationConfig();
>     this.length = replicationConfig.getData() * length;                       
>   <-- a new length defined for EC
>   }
>   @Override public long getLength() { return length; }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to