[
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929722#comment-15929722
]
Yiqun Lin edited comment on HDFS-11535 at 3/17/17 10:44 AM:
------------------------------------------------------------
{quote}
I'm not sure I understood this scenario...also... the information on the inner
nodes already has nothing to do with the actual number of storages, it is the
number of datanodes with that storage type.
{quote}
Please ignore the comment which makes you confused. I found it was not correct.
:).
There are something I forgot to mentioned in my last comments. In the
threshold-based approach, if users want to always use the old method or new
method, he can just set the threshold to 0 or 1. If the threshold is set by 0,
that mean users want to always use the old method since each storage type's
percentage will always more than 0. So threshold-based way seems a more
flexible way and can be good used for multiple user's scenarios.
was (Author: linyiqun):
{quote}
{quote}
In some special case, one node will not just contain one storage type. ...
{quote}
I'm not sure I understood this scenario...also... the information on the inner
nodes already has nothing to do with the actual number of storages, it is the
number of datanodes with that storage type.
{quote}
Please ignore the comment which makes you confused. I found it was not correct.
:).
There are something I forgot to mentioned in my last comments. In the
threshold-based approach, if users want to always use the old method or new
method, he can just set the threshold to 0 or 1. If the threshold is set by 0,
that mean users want to always use the old method since each storage type's
percentage will always more than 0. So threshold-based way seems a more
flexible way and can be good used for multiple user's scenarios.
> Performance analysis of new DFSNetworkTopology#chooseRandom
> -----------------------------------------------------------
>
> Key: HDFS-11535
> URL: https://issues.apache.org/jira/browse/HDFS-11535
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Reporter: Chen Liang
> Assignee: Chen Liang
> Attachments: HDFS-11535.001.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we
> did. For those who are interested, please the attached .pdf file for more
> detail. The attached patch file includes the experiment code we ran.
> The key insights we got from these tests is that: although *the new method
> outperforms the current one in most cases*. There is still *one case where
> the current one is better*. Which is when there is only one storage type in
> the cluster, and we also always look for this storage type. In this case, it
> is simply a waste of time to perform storage-type-based pruning, blindly
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage
> type info, we can *just check root node to see if X is the only type it has*.
> If yes, blindly picking a random leaf will work, so we simply call the old
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is
> garbage collection. The new method does a few more object creation when doing
> the search, which adds overhead to GC. I'm still thinking of any potential
> optimization but this seems tricky, also I'm not sure whether this
> optimization worth doing at all. Please feel free to leave any
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]