[
https://issues.apache.org/jira/browse/HDFS-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-5708:
---------------------------------------
Attachment: HDFS-5708.001.patch
I realized that we don't actually need to consult {{BlockManager}} at all in
the cache report handler. We can simply faithfully record the block IDs that
the {{DataNode}} thinks are cached. Then, the {{CacheReplicationMonitor}} can
take care of preparing uncache requests for the ones that shouldn't be cached.
In general, we don't expect to see many not-in-BlockManager /
not-in-COMPLETE-state / corrupt cached blocks. The ones that we do see should
be transitory. not-in-BlockManager blocks should eventually become known to
the BlockManager through block reports. not-in-COMPLETE-state blocks should be
taken care of by the DN itself-- it starts the (potentially slow) uncaching
process when it transitions a cached block away from COMPLETE. Corrupt blocks
should be removed by the block manager itself (they are corrupt, after all).
The current patch changes it so that NN does eventually send UNCACHE requests
for all these offenders, when {{CacheReplicationMonitor}} identifies them.
However, it's expected that in most cases the UNCACHE request won't be needed--
they're just added for completeness and as a backstop against bugs or corner
cases.
I modified a unit test to fire off a cache report with a bogus block ID at the
beginning, as a regression test for this bug.
> The CacheManager throws a NPE in the DataNode logs when processing cache
> reports that refer to a block not known to the BlockManager
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5708
> URL: https://issues.apache.org/jira/browse/HDFS-5708
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5708.001.patch
>
>
> The CacheManager throws a NPE in the DataNode logs when processing cache
> reports that refer to a block we haven't learned about yet.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)