Chen Liang created HDFS-11535:
---------------------------------
Summary: Performance analysis of new
DFSNetworkTopology#chooseRandom
Key: HDFS-11535
URL: https://issues.apache.org/jira/browse/HDFS-11535
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Reporter: Chen Liang
Assignee: Chen Liang
Attachments: PerfTest.pdf
This JIRA is created to post the results of some performance experiments we
did. For those who are interested, please the attached .pdf file for more
detail. The attached patch file includes the experiment code we ran.
The key insights we got from these tests is that: although *the new method
outperforms the current one in most cases*. There is still *one case where the
current one is better*. Which is when there is only one storage type in the
cluster, and we also always look for this storage type. In this case, it is
simply a waste of time to perform storage-type-based pruning, blindly picking
up a random node (current methods) would suffice.
Therefore, based on the analysis, we propose to use a *combination of both the
old and the new methods*:
say, we search for a node of type X, since now inner node all keep storage type
info, we can *just check root node to see if X is the only type it has*. If
yes, blindly picking a random leaf will work, so we simply call the old method,
otherwise we call the new method.
There is still at least one missing piece in this performance test, which is
garbage collection. The new method does a few more object creation when doing
the search, which adds overhead to GC. I'm still thinking of any potential
optimization but this seems tricky, also I'm not sure whether this optimization
worth doing at all. Please feel free to leave any comments/suggestions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]