[ 
https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624356#comment-14624356
 ] 

Walter Su commented on HDFS-8287:
---------------------------------

The normal file:
client read from local disk, then streamer writes to DN parallelly.
Time = Max(disk I/O time, network I/O time)
Network speed is slower than disk. So it's network I/O bound.

The EC file:
client read from local disk, encode, then streamer writes to DN.
Time = Max(disk I/O time + coding time, network I/O time)
Coding is slow, so it's CPU bound.

If coding time reduces dramatically and ignorable, then EC file is the same as 
normal file. We can close this jira.
HADOOP-11540 can reduces coding time to 1/10. But I won't expect it becomes 
ignorable. So It's worth working on this jira.

Hi, [~kaisasak]! I think it's worth. Please go ahead.
Hi, [~szetszwo]! Is "using an independent thread to do the encode()" sound 
reasonable to you?

> DFSStripedOutputStream.writeChunk should not wait for writing parity 
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8287
>                 URL: https://issues.apache.org/jira/browse/HDFS-8287
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Kai Sasaki
>
> When a stripping cell is full, writeChunk computes and generates parity 
> packets.  It sequentially calls waitAndQueuePacket so that user client cannot 
> continue to write data until it finishes.
> We should allow user client to continue writing instead but not blocking it 
> when writing parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to