[ 
https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942306#comment-13942306
 ] 

Colin Patrick McCabe commented on HDFS-6093:
--------------------------------------------

{code}
With the locking issues resolved, is it okay to just leave it with a single set 
of variables? I could switch it over to AtomicLongs or something, but I think 
it's all under the FSN lock anyway.
{code}

I think putting more things under the big lock is the wrong direction to go.  
In particular, we will eventually need to release the big lock from time to 
time while doing the CacheReplicationMonitor scan.  When we do that, having 
just one set of counters is not going to work.  It seems simple enough just to 
have a {{CacheManager#Counters}} object with its own lock, and set it at the 
end of the scan.  There's other ways to do this too (atomics, etc.)

This would also make it easier to modify the pending cache count in 
{{processCacheReportImpl}}.  It's easy to understand the concept of modifying a 
copy of the stats, harder to understand all the locking interactions of 
modifying the counter that the CRM is actually using.  At least for me.

With regard to the {{processCacheReportImpl}} changes, I think there are still 
some issues here.  I don't like the fact that we are now potentially allocating 
a TreeMap of size NUM_PENDING_UNCACHED blocks in every cache report.  There are 
a few different ways to handle this without a huge memory blowup.  The simplest 
is probably to remove the "final" on {{DatanodeDescriptor#pendingUncached}}.  
Then you just create a new list in {{processCacheReportImpl}}, and selectively 
add the still-need-to-be-uncached blocks to that.  Then at the end, you throw 
away the old list and make {{DatanodeDescriptor}} use the new list.

+1 once all that is addressed

> Expose more caching information for debugging by users
> ------------------------------------------------------
>
>                 Key: HDFS-6093
>                 URL: https://issues.apache.org/jira/browse/HDFS-6093
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: caching
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-6093-1.patch, hdfs-6093-2.patch, hdfs-6093-3.patch, 
> hdfs-6093-4.patch
>
>
> When users submit a new cache directive, it's unclear if the NN has 
> recognized it and is actively trying to cache it, or if it's hung for some 
> other reason. It'd be nice to expose a "pending caching/uncaching" count the 
> same way we expose pending replication work.
> It'd also be nice to display the aggregate cache capacity and usage in 
> dfsadmin -report, since we already have have it as a metric and expose it 
> per-DN in report output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to