[ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5053:
------------------------------

    Attachment: hdfs-5053-1.patch

Here's a mondo patch that hooks everything up. At a high-level, I added some 
new classes that mimic functionality present in {{BlockManager}}: 
{{CacheReplicationManager}} and the background {{CacheReplicationMonitor}} 
thread. I initially tried to refactor {{BlockManager}} but ended up with very 
little code reuse for the following reasons:

- Caching is really cheap compared to block replication.
    -- There's no network traffic, we just need to spend a second reading a 
block off disk.
    -- There's no need for a source and targets, so a bunch of the code 
surrounding block placement and the default block placement policy itself isn't 
necessary.
    -- We don't really care about racks either since it's so cheap to re-cache.
    -- No need to throttle cache work since (again) it's cheap, and DNs already 
throttle themselves.
- The concept of "under construction" and "corrupt" replicas don't apply to 
cached replicas.
  -- Right now, datanodes uncache as soon as a replica becomes under 
construction, so UC replicas shouldn't even get reported.
  -- We also don't need to keep corrupt replicas around until the repl factor 
comes back up; so we can just invalidate them immediately and bring them up 
somewhere else.

Some caveats with the current patch, I also punted on some things I don't fully 
understand yet:

- Needs more tests, obviously
- The replication target-choosing policy just chooses randomly biased by free 
cache space, this could perhaps be improved. Same for target-choosing for 
uncaching, it's just random.
- There's this business with queuing block work on the standby for later 
processing which I skipped. Cache reports are much more frequent (and smaller) 
than block reports, so maybe it's okay to just wait for a new report.
- Didn't do excess replicate tracking, we go right to invalidating excess 
replicas when they're reported in.
- Didn't do the optimized initial cache report case
- Didn't do the "stale block contents" / "postpone misreplicated blocks" for 
handling overreplicated blocks on a failover. Consequences are less severe for 
caching, but we should probably eventually fix this.
                
> NameNode should invoke DataNode APIs to coordinate caching
> ----------------------------------------------------------
>
>                 Key: HDFS-5053
>                 URL: https://issues.apache.org/jira/browse/HDFS-5053
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: hdfs-5053-1.patch
>
>
> The NameNode should invoke the DataNode APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to