[
https://issues.apache.org/jira/browse/HADOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591212#action_12591212
]
Raghu Angadi commented on HADOOP-3288:
--------------------------------------
> the argument that we can't have it both ways?
We certainly can have HDFS be close to optimal for many different loads.
personally, given the s early stage for HDFS, I would be much more tempted
towards simpler concepts and implementations that have good macro benefit. I do
not know yet if what is proposed here falls under this category. Certainly a
good discussion.
> Serial streaming performance should be Math.min(ideal client performance,
> ideal serial hdfs performance)
> --------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-3288
> URL: https://issues.apache.org/jira/browse/HADOOP-3288
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.3, 0.18.0
> Environment: Mac OS X 10.5.2, Java 6
> Reporter: Sam Pullara
> Fix For: 0.18.0
>
>
> I looked at all the code long and hard and this was my analysis (could be
> wrong, I'm not an expert on this codebase):
> Current Serial HDFS performance = Average Datanode Performance
> Average Datanode Performance = Average Disk Performance (even if you have
> more than one)
> We should have:
> Ideal Serial HDFS Performance = Sum of Ideal Datanode Performance
> Ideal Datanode Performance = Sum of disk performance
> When you read a single file serially from HDFS there are a number of
> limitations that come into play:
> 1) Blocks on multiple datanodes will be load balanced between them -
> averaging the performance of the datanodes
> 2) Blocks on multiple disks in a single datanode are load balanced between
> them - averaging the performance of the disks
> I think that all this could be fixed if we actually prefetched fully read
> blocks on the client until the client can no longer keep up with the data or
> there is another bottleneck like network bandwidth.
> This seems like a reasonably common use case though not the typical MapReduce
> case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.