[ 
https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261612#comment-16261612
 ] 

Anu Engineer commented on HDFS-12794:
-------------------------------------

[~shashikant] Thanks for updating this patch. I am afraid that 
bq. I am worried there is a latent bug here, which our tests are not exposing. 
is true. 

Here is what I did, 
* I added  new function call in ContainerProtocols.java  {code}
  /**
   * Validates a response from a container protocol call.  Any non-successful
   * return code is mapped to a corresponding exception and thrown.
   *
   * @param response container protocol call response
   * @throws IOException if the container protocol call failed
   */
  public static void validateContainerResponse(
      ContainerCommandResponseProto response, boolean error) throws
      StorageContainerException {
    if(error) {
      throw new StorageContainerException(response.getMessage(),
          response.getResult());
    }
  }
{code}
* Then changed the call in ChunkOutputStream#writeChunkToContainer to call this 
function
{{ContainerProtocolCalls.validateContainerResponse(reply, true);}}
* As expected the calling function is gone and the netty handler has to catch 
that in the code.
{noformat}
Caused by: 
org.apache.hadoop.scm.container.common.helpers.StorageContainerException: 
        at 
org.apache.hadoop.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:424)
        at 
org.apache.hadoop.scm.storage.ChunkOutputStream.lambda$writeChunkToContainer$0(ChunkOutputStream.java:245)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
        at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
        at 
org.apache.hadoop.scm.XceiverClientHandler.channelRead0(XceiverClientHandler.java:88)
        at 
org.apache.hadoop.scm.XceiverClientHandler.channelRead0(XceiverClientHandler.java:44)
        at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
        at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1302)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        at java.lang.Thread.run(Thread.java:745)
{noformat}


In other words {{writeChunkToContainer}} the exception handling seems to be 
wrong. Can you please take a look at this? I would propose that an easy fix 
would be to make sure that future object is returned to the caller function as 
well as calling into completedExceptionally. Otherwise, the caller thread is 
gone and there is no thread to handle the exception.


> Ozone: Parallelize ChunkOutputSream Writes to container
> -------------------------------------------------------
>
>                 Key: HDFS-12794
>                 URL: https://issues.apache.org/jira/browse/HDFS-12794
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Shashikant Banerjee
>            Assignee: Shashikant Banerjee
>             Fix For: HDFS-7240
>
>         Attachments: HDFS-12794-HDFS-7240.001.patch, 
> HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch, 
> HDFS-12794-HDFS-7240.004.patch
>
>
> The chunkOutPutStream Write are sync in nature .Once one chunk of data gets 
> written, the next chunk write is blocked until the previous chunk is written 
> to the container.
> The ChunkOutputWrite Stream writes should be made async and Close on the 
> OutputStream should ensure flushing of all dirty buffers to the container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to