[
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433269#comment-15433269
]
Xiaoyu Yao edited comment on HDFS-10690 at 8/23/16 5:40 PM:
------------------------------------------------------------
[~fenghua_hu], I understand the tradeoff between memory footprint and the
insert/delete performance here. Note the ShortCircuitCache#replicaInfoMap is a
HashMap too, we are O( n) worst case and O(1) on average anyway for
ref()/unref() case. So I don't think that will change the time complexity of
ref()/unref() case.
After taking another look at the get() usage issue you mentioned earlier, I
don't think we need the extra indexOf(). A LinkedMap is extended from
AbstractLinkedMap, which extended from AbstractHashedMap. Because of that, we
are calling into the AbstractHashedMap#get(Object Key) here according to
https://commons.apache.org/proper/commonscollections/jacoco/org.apache.commons.collections4.map/AbstractHashedMap.java.html.
This will still be O(1) on average and O( n) worst case.
I would appreciate if you could post the result of the simpler approach that I
proposed under the same test environment. If the difference is significant, we
can look at alternatives as you suggested. Let me know your thoughts. Thanks!
was (Author: xyao):
[~fenghua_hu], I understand the tradeoff between memory footprint and the
insert/delete performance here. Note the ShortCircuitCache#replicaInfoMap is a
HashMap too, we are O( n) worst case and O(1) on average anyway for
ref()/unref() case. So I don't think that will change the time complexity of
ref()/unref() case.
After taking another look at the get() usage issue you mentioned earlier, I
don't think we need the extra indexOf(). A LinkedMap is extended from
AbstractLinkedMap, which extended from AbstractHashedMap. Because of that, we
are calling into the AbstractHashedMap#get(Object Key) here according to
https://commons.apache.org/proper/commonscollections/jacoco/org.apache.commons.collections4.map/AbstractHashedMap.java.html.
This will still be O(1) on average and O(n) worst case.
I would appreciate if you could post the result of the simpler approach that I
proposed under the same test environment. If the difference is significant, we
can look at alternatives as you suggested. Let me know your thoughts. Thanks!
> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0-alpha2
> Reporter: Fenghua Hu
> Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch,
> ShortCircuitCache_LinkedMap.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the
> cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost
> inserting/removing an entry becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a
> very low cost operation. On the other hand, list is not lookup-friendly. To
> address this issue, we introduce two references into ShortCircuitReplica
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and
> patch will be posted soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]