[
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wang updated HDFS-5053:
------------------------------
Attachment: hdfs-5053-1.patch
Here's a mondo patch that hooks everything up. At a high-level, I added some
new classes that mimic functionality present in {{BlockManager}}:
{{CacheReplicationManager}} and the background {{CacheReplicationMonitor}}
thread. I initially tried to refactor {{BlockManager}} but ended up with very
little code reuse for the following reasons:
- Caching is really cheap compared to block replication.
-- There's no network traffic, we just need to spend a second reading a
block off disk.
-- There's no need for a source and targets, so a bunch of the code
surrounding block placement and the default block placement policy itself isn't
necessary.
-- We don't really care about racks either since it's so cheap to re-cache.
-- No need to throttle cache work since (again) it's cheap, and DNs already
throttle themselves.
- The concept of "under construction" and "corrupt" replicas don't apply to
cached replicas.
-- Right now, datanodes uncache as soon as a replica becomes under
construction, so UC replicas shouldn't even get reported.
-- We also don't need to keep corrupt replicas around until the repl factor
comes back up; so we can just invalidate them immediately and bring them up
somewhere else.
Some caveats with the current patch, I also punted on some things I don't fully
understand yet:
- Needs more tests, obviously
- The replication target-choosing policy just chooses randomly biased by free
cache space, this could perhaps be improved. Same for target-choosing for
uncaching, it's just random.
- There's this business with queuing block work on the standby for later
processing which I skipped. Cache reports are much more frequent (and smaller)
than block reports, so maybe it's okay to just wait for a new report.
- Didn't do excess replicate tracking, we go right to invalidating excess
replicas when they're reported in.
- Didn't do the optimized initial cache report case
- Didn't do the "stale block contents" / "postpone misreplicated blocks" for
handling overreplicated blocks on a failover. Consequences are less severe for
caching, but we should probably eventually fix this.
> NameNode should invoke DataNode APIs to coordinate caching
> ----------------------------------------------------------
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode, namenode
> Reporter: Colin Patrick McCabe
> Assignee: Andrew Wang
> Attachments: hdfs-5053-1.patch
>
>
> The NameNode should invoke the DataNode APIs to coordinate caching.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira