[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Zhe Zhang (JIRA) Mon, 28 Sep 2015 13:07:31 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933899#comment-14933899
 ]


Zhe Zhang commented on HDFS-9040:
---------------------------------

I think the latest patch looks pretty good -- thanks Jing for the great work! A 
few comments below. Most of them can be addressed separately. If we all agree 
upon the direction of HDFS-9079 I'm happy to make the changes there too.

* When writing the first block in the file, or if the streamer is the fastest 
to finish a block, {{followingBlocks}} might not be ready when the below is 
reached. For example, if the RPC call {{addBlock}} is slow, or when the client 
has a delay between writing the last chunk of block_0 and the first chunk of 
block_1. Should we {{take}} instead of {{poll}}?
{code}
  /**
   * The upper level DFSStripedOutputStream will allocate the new block group.
   * All the striped data streamer only needs to fetch from the queue, which
   * should be already be ready.
   */
  private LocatedBlock getFollowingBlock() throws IOException {
{code}
 * The rest of the error-handling logics looks good. {{writeChunk}} => 
{{checkStreamerFailures}} is the key sync point here. I agree we should let 
this JIRA focus on the main logic and dedicate HDFS-9098 to testing.

Nits:
* {{callUpdatePipeline}} can now be folded into {{updatePipeline}}
* {{updatePipelineInternal}} is not an "internal" method of {{updatePipeline}}, 
maybe {{setupPipelineInternal}}?

Long-term:
* The current subclassing structure of {{DFSOutputStream}} and {{DataStreamer}} 
is not ideal. The striped subclasses are inheriting some unnecessary 
complexities. Meanwhile we need to add hooks in the superclass which only make 
sense for the striped subclass. We can think about separating out a real super 
class for both contiguous and striped output logics.

+1 pending a clarification of the first comment.

> Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests 
> to Coordinator)
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9040
>                 URL: https://issues.apache.org/jira/browse/HDFS-9040
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Jing Zhao
>         Attachments: HDFS-9040-HDFS-7285.002.patch, 
> HDFS-9040-HDFS-7285.003.patch, HDFS-9040-HDFS-7285.004.patch, 
> HDFS-9040-HDFS-7285.005.patch, HDFS-9040-HDFS-7285.005.patch, 
> HDFS-9040-HDFS-7285.006.patch, HDFS-9040.00.patch, HDFS-9040.001.wip.patch, 
> HDFS-9040.02.bgstreamer.patch
>
>
> The general idea is to simplify error handling logic.
> -Proposal 1:-
> -A BlockGroupDataStreamer to communicate with NN to allocate/update block, 
> and StripedDataStreamer s only have to stream blocks to DNs.-
> Proposal 2:
> See below the 
> [comment|https://issues.apache.org/jira/browse/HDFS-9040?focusedCommentId=14741388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14741388]
>  from [~jingzhao].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9040) Erasure coding: Refactor DFSStripedOutputStream (Move Namenode RPC Requests to Coordinator)

Reply via email to