[
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433142#comment-15433142
]
Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:38 PM:
------------------------------------------------------------
I would like to explain the solution here.
Currently, TreeMap is used to track all the ShortCircuitReplica entries in the
order of being inserted. These entries could be removed from TreeMap in two
cases. The first is when the entry is accessed again, it will be removed from
TreeMap. Please note that the entry could be anyone in the TreeMap. The other
is when the entry is evicted due to treemap size limitation, in this case, only
the eldest entry will be removed.
Removal is a costly operation for the first case, because looking up
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve
it, we design a new data structure LruList, which entirely eliminates costly
look-up operation.
+----------------+ +-----------------+
+-----------------+
| | |
| | |
---->| Replica 1 |-----Next----> | Replica 2 |-----Next----> |
Replica 3 |
| | |
| | |
<-----| |<-----Prev---- |
|<-----Prev---- | |
+----------------+ +-----------------+
+-----------------+
We introduced two references in ShortCircuitReplica objects. Reference Next
points to the elder ShortCircuitReplica and Prev points to the younger one. All
the replica is doubly linked in order of insertion time. The youngest is always
at the head of the linked list, and the eldest is always at the tail. Removing
the entries between the head and the tail doesn't need any lookup, because the
replica knows its position in linked list by next and prev, thus remove is
simple: change it's precedessor's and its succssor's next and prev. The order
of operation is always O(1).
For insertion, the youngest entry is always be added to the head, thus the
operation is also O(1).
Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1)
operation for insertion/lookup/removal.
Here comes a brief test result:
Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K
The performance gain is (135 - 95) / 95 = 42%.
Suggestions/comments are very welcomed.
was (Author: fenghua_hu):
I would like to explain the solution here.
Currently, TreeMap is used to track all the ShortCircuitReplica entries in the
order of being inserted. These entries could be removed from TreeMap in two
cases. The first is when the entry is accessed again, it will be removed from
TreeMap. Please note that the entry could be anyone in the TreeMap. The other
is when the entry is evicted due to treemap size limitation, in this case, only
the eldest entry will be removed.
Removal is a costly operation for the first case, because looking up
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve
it, we design a new data structure LruList, which entirely eliminates costly
look-up operation.
+----------------+ +-----------------+
|Replica 1 | |Replica 2 |
+----------------+ +-----------------+
---->| Next |-----> | Next |------->...
+----------------+ +-----------------+
<-----| Prev |<-----| Prev |<------...
+----------------+ +-----------------+
| | | |
+----------------+ +-----------------+
We introduced two references in ShortCircuitReplica objects. Reference Next
points to the elder ShortCircuitReplica and Prev points to the younger one. All
the replica is doubly linked in order of insertion time. The youngest is always
at the head of the linked list, and the eldest is always at the tail. Removing
the entries between the head and the tail doesn't need any lookup, because the
replica knows its position in linked list by next and prev, thus remove is
simple: change it's precedessor's and its succssor's next and prev. The order
of operation is always O(1).
For insertion, the youngest entry is always be added to the head, thus the
operation is also O(1).
Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1)
operation for insertion/lookup/removal.
Here comes a brief test result:
Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K
The performance gain is (135 - 95) / 95 = 42%.
Suggestions/comments are very welcomed.
> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.0.0-alpha2
> Reporter: Fenghua Hu
> Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch,
> ShortCircuitCache_LinkedMap.patch
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the
> cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost
> inserting/removing an entry becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a
> very low cost operation. On the other hand, list is not lookup-friendly. To
> address this issue, we introduce two references into ShortCircuitReplica
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and
> patch will be posted soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]