[ 
https://issues.apache.org/jira/browse/HDFS-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5708:
---------------------------------------

    Attachment: HDFS-5708.001.patch

I realized that we don't actually need to consult {{BlockManager}} at all in 
the cache report handler.  We can simply faithfully record the block IDs that 
the {{DataNode}} thinks are cached.  Then, the {{CacheReplicationMonitor}} can 
take care of preparing uncache requests for the ones that shouldn't be cached.

In general, we don't expect to see many not-in-BlockManager / 
not-in-COMPLETE-state / corrupt cached blocks.  The ones that we do see should 
be transitory.  not-in-BlockManager blocks should eventually become known to 
the BlockManager through block reports.  not-in-COMPLETE-state blocks should be 
taken care of by the DN itself-- it starts the (potentially slow) uncaching 
process when it transitions a cached block away from COMPLETE.  Corrupt blocks 
should be removed by the block manager itself (they are corrupt, after all).

The current patch changes it so that NN does eventually send UNCACHE requests 
for all these offenders, when {{CacheReplicationMonitor}} identifies them.  
However, it's expected that in most cases the UNCACHE request won't be needed-- 
they're just added for completeness and as a backstop against bugs or corner 
cases.

I modified a unit test to fire off a cache report with a bogus block ID at the 
beginning, as a regression test for this bug.

> The CacheManager throws a NPE in the DataNode logs when processing cache 
> reports that refer to a block not known to the BlockManager
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5708
>                 URL: https://issues.apache.org/jira/browse/HDFS-5708
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5708.001.patch
>
>
> The CacheManager throws a NPE in the DataNode logs when processing cache 
> reports that refer to a block we haven't learned about yet.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to