[
https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624356#comment-14624356
]
Walter Su commented on HDFS-8287:
---------------------------------
The normal file:
client read from local disk, then streamer writes to DN parallelly.
Time = Max(disk I/O time, network I/O time)
Network speed is slower than disk. So it's network I/O bound.
The EC file:
client read from local disk, encode, then streamer writes to DN.
Time = Max(disk I/O time + coding time, network I/O time)
Coding is slow, so it's CPU bound.
If coding time reduces dramatically and ignorable, then EC file is the same as
normal file. We can close this jira.
HADOOP-11540 can reduces coding time to 1/10. But I won't expect it becomes
ignorable. So It's worth working on this jira.
Hi, [~kaisasak]! I think it's worth. Please go ahead.
Hi, [~szetszwo]! Is "using an independent thread to do the encode()" sound
reasonable to you?
> DFSStripedOutputStream.writeChunk should not wait for writing parity
> ---------------------------------------------------------------------
>
> Key: HDFS-8287
> URL: https://issues.apache.org/jira/browse/HDFS-8287
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Kai Sasaki
>
> When a stripping cell is full, writeChunk computes and generates parity
> packets. It sequentially calls waitAndQueuePacket so that user client cannot
> continue to write data until it finishes.
> We should allow user client to continue writing instead but not blocking it
> when writing parity.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)