[ 
https://issues.apache.org/jira/browse/HDFS-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314044#comment-16314044
 ] 

Íñigo Goiri commented on HDFS-12802:
------------------------------------

bq. How frequently are entries invalidated and how large is the map? If the 
cleanup is only necessary after running for several weeks but invalidation 
(including successors) is relatively common, then the Guava implementation may 
not match the workload.

In our case, the mount table edits (the ones that trigger these invalidations) 
are fairly infrequent, once a day or so.
The issue is actually the amount of proxied operations for unique files.
So I'd say that it's relatively uncommon.

bq. Apache Commons has a ReferenceMap implementation that can use soft 
references; since router memory utilization grows slowly, that would fit this 
case. Unfortunately, it's not sorted (IIRC) so invalidation would still be 
expensive.

Yep, not sorted.

bq. If this does use the Guava Cache, sorting the keyset to extract the submap 
is more expensive than removing entries by a full scan. It could use 
Collection::removeIf to match a prefix of the keyset.

I don't think I can use {{Collection::removeIf}} on top of the guava cache; I 
would go for iterating over and remove the keys with the prefix.

bq. The Guava impl is also threadsafe, so concurrent invalidations would not 
form a convoy.

We have a read/write lock for syncing the {{tree}} and the {{locationCache}} so 
we are not taking much advantage of it.

Do we go ahead with the Guava cache and invlaidating key by key?
Another thing, is that we may not even need the cache time limit, the size 
covers the use case.

> RBF: Control MountTableResolver cache size
> ------------------------------------------
>
>                 Key: HDFS-12802
>                 URL: https://issues.apache.org/jira/browse/HDFS-12802
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Íñigo Goiri
>            Assignee: Íñigo Goiri
>         Attachments: HDFS-12802.000.patch, HDFS-12802.001.patch, 
> HDFS-12802.002.patch, HDFS-12802.003.patch
>
>
> Currently, the {{MountTableResolver}} caches the resolutions for the 
> {{PathLocation}}. However, this cache can grow with no limits if there are a 
> lot of unique paths. Some of these cached resolutions might not be used at 
> all.
> The {{MountTableResolver}} should clean the {{locationCache}} periodically.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to