[
https://issues.apache.org/jira/browse/HDFS-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640059#comment-13640059
]
Aaron T. Myers commented on HDFS-4698:
--------------------------------------
Patch looks pretty good to me, though I do believe the findbugs warning is
legitimate.
A few little comments:
# It looks to me like the patch misses a place in DFSInputStream where it
should be adding to the statistics before closing a BlockReader. Currently the
patch only adds the stats in DFSInputStream#blockSeekTo, but I think they
should also be added in DFSInputStream#close.
# Recommend you add a comment to DFSInputStream#getReadStatistics about how to
use the API, i.e. that the stats will only be up-to-date after closing the
DFSInputStream.
# Recommend adding comments to DFSInputStream.ReadStatistics explaining the
meaning of the various fields, i.e. that SCR bytes will count for both SCR and
"local bytes", that total >= local >= SCR, that remote bytes read can be
determined by total - local, etc.
# For that matter, you might want to add a getRemoteBytesRead method to
DFSInputStream.ReadStatistics to do the subtraction for the user.
# Any thoughts about how this new feature should interact with the existing
FileSystem#Statistics class? Valid answers include "not at all" and/or "this
will be helpful as-is, we can think about that later."
> provide client-side metrics for remote reads, local reads, and short-circuit
> reads
> ----------------------------------------------------------------------------------
>
> Key: HDFS-4698
> URL: https://issues.apache.org/jira/browse/HDFS-4698
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 2.0.3-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4698.001.patch
>
>
> We should provide metrics to let clients know how many bytes of data they
> have read remotely, versus locally or via short-circuit local reads. This
> will allow clients to know how well they're doing at bringing the computation
> to the data, which will be useful in evaluating placement policies and
> cluster configurations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira