Ivan Andika created HDDS-15577:
----------------------------------
Summary: Improve getBlockDNCache logic and cleanup
Key: HDDS-15577
URL: https://issues.apache.org/jira/browse/HDDS-15577
Project: Apache Ozone
Issue Type: Improvement
Reporter: Ivan Andika
Assignee: Ivan Andika
This is simply an observation in XceiverClientGrpc getBlockDNcache.
getBlockDNCache is used to cache the DN which returned the GetBlock command so
that the ReadChunk command can be sent to the same DN. The idea is that if the
GetBlock for BCSID x returns successfully, the subsequence ReadChunk will also
return successfully, whereas for other datanodes, the data might not have been
replicated yet.
There are some identified issues.
First, it seems that getBlockDNcache entries are not cleaned up regularly.
Although the XceiverClientManager will evict the XceiverClientGrpc every
scm.container.client.idle.threshold (default 10s). If the particular
XceiverClientGrpc is accessed a lot of time (hot client), the size of the
getBlockDNcache might increase and can cause real memory overhead. The possible
solution is to use Guava cache for getBlockDNcache to ensure that the entries
are evicted. Additionally, we might also clear the map in close() so that the
objects GC cleanup do not need to wait for XceiverClientGrpc to be GC collected.
Secondly, in HDDS-10593 we added another sort datanodes logic that will
prioritize IN_SERVICE datanodes over datanodes in maintenance / decommission.
However, this will reorder the datanodeList again causing the cached DN for
that block ID might not be the first datanode. We can have a flag hasCachedDN
whenever there is a hit in the getBlockDNcache and skip the sorting.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]