[ 
https://issues.apache.org/jira/browse/HDFS-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928566#comment-15928566
 ] 

Chen Liang commented on HDFS-11535:
-----------------------------------

Thanks [~arpitagarwal] and [~linyiqun] for the comments!

Just to make sure we are on the same page:

bq. If there are 99% storage type X and only 1% for storage type Y, actually 
here we should use the old method.

If we only search for X, this is true, but is we will searching for Y, the old 
method will be exceptionally slow. So based on this, I think your point of 
searching based on percentage is actually a very good proposal. 

bq. In some special case, one node will not just contain one storage type. 
Maybe it will have two or two more different storage types. Based on this, the 
old method will also be better than the new method no matter how many the 
target storage has in cluster. As long as one node contain one target storage 
type then it can be quickly chosen.

I'm not sure I understood this scenario...also... the information on the inner 
nodes already has nothing to do with the actual number of storages, it is the 
number of datanodes with that storage type.

Additionally, an alternative approach I thought about, was that doing this "use 
old or new method?" check on every inner node: simply replacing "..check root 
node to .." to "...check current inner node to ..." in my original proposal. 
For example, in your X and Y example, say we look for X and we decide to use 
new method at root, because there are two types X and Y. Then we pick a random 
child node, check again and found that this child node only has X. Then we 
simply call the old method and return. I think this is probably closest to 
optimality but it adds more complexity to the already fairly complex code 
logic....I personally think your proposal of threshold-based approach is good 
enough.

Will address the other comments about the unit test later on. (I will probably 
remove the writeToDisk calls because the data file themselves are barely useful 
without additional parsing).


> Performance analysis of new DFSNetworkTopology#chooseRandom
> -----------------------------------------------------------
>
>                 Key: HDFS-11535
>                 URL: https://issues.apache.org/jira/browse/HDFS-11535
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>         Attachments: HDFS-11535.001.patch, PerfTest.pdf
>
>
> This JIRA is created to post the results of some performance experiments we 
> did.  For those who are interested, please the attached .pdf file for more 
> detail. The attached patch file includes the experiment code we ran. 
> The key insights we got from these tests is that: although *the new method 
> outperforms the current one in most cases*. There is still *one case where 
> the current one is better*. Which is when there is only one storage type in 
> the cluster, and we also always look for this storage type. In this case, it 
> is simply a waste of time to perform storage-type-based pruning, blindly 
> picking up a random node (current methods) would suffice.
> Therefore, based on the analysis, we propose to use a *combination of both 
> the old and the new methods*:
> say, we search for a node of type X, since now inner node all keep storage 
> type info, we can *just check root node to see if X is the only type it has*. 
> If yes, blindly picking a random leaf will work, so we simply call the old 
> method, otherwise we call the new method.
> There is still at least one missing piece in this performance test, which is 
> garbage collection. The new method does a few more object creation when doing 
> the search, which adds overhead to GC. I'm still thinking of any potential 
> optimization but this seems tricky, also I'm not sure whether this 
> optimization worth doing at all. Please feel free to leave any 
> comments/suggestions.
> Thanks [~arpitagarwal] and [~szetszwo] for the offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to