[
https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319136#comment-14319136
]
Zhe Zhang commented on HDFS-7729:
---------------------------------
Thanks [~jingzhao] and [~szetszwo] for the in-depth thought. Hope the following
analysis helps exploring the {{DataStreamer}} refactor.
Logically this JIRA needs to do 3 main things:
# Extend {{DFSOutputStream#streamer}} to have multiple {{streamers}}
# In {{DFSOutputStream#writeChunk}}, add the logic of distributing / striping
packets to different {{streamers}}
# In {{DataStream#nextBlockOutputStream}}, extend the {{addBlock}} logic to
allocate block groups and give the individual blocks to peer streamers.
A standalone {{DataStreamer}} class will work easily with #1 and #2. To handle
#3, we just need to move the {{locateFollowingBlock}} logic to
{{DFSOutputStream}}.
Subclassing {{DFSOutputStream}} is a good idea and it does seem feasible if we
separate out {{DataStreamer}}. We keep a single {{streamer}} variable
representing the _current streamer_. Step #2 above should take care of updating
its value to the next streamer when reaching striping cell boundary.
> Add logic to DFSOutputStream to support writing a file in striping layout
> --------------------------------------------------------------------------
>
> Key: HDFS-7729
> URL: https://issues.apache.org/jira/browse/HDFS-7729
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Li Bo
> Assignee: Li Bo
> Attachments: Codec-tmp.patch, HDFS-7729-001.patch,
> HDFS-7729-002.patch, HDFS-7729-003.patch, HDFS-7729-004.patch,
> HDFS-7729-005.patch, HDFS-7729-006.patch, HDFS-7729-007.patch,
> HDFS-7729-008.patch
>
>
> If client wants to directly write a file striping layout, we need to add some
> logic to DFSOutputStream. DFSOutputStream needs multiple DataStreamers to
> write each cell of a stripe to a remote datanode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)