[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path

Chris Nauroth (JIRA) Mon, 14 Oct 2013 14:24:24 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794501#comment-13794501
 ]


Chris Nauroth commented on HDFS-5096:
-------------------------------------

A couple of quick comments looking at version 6 of the patch:

{{CacheReplicationMonitor#rescanCachedBlockMap}}: Something seems off about the 
logic for manipulating pending-cached and pending-uncached.  Is it just that 
the comments are wrong and I'm getting confused?

{code}
      if (neededReplication <= numCached) {
        // If we have all the replicas we need, or too few, drop all 
        // pending cached.
        for (DatanodeDescriptor datanode : pendingCached) {
          datanode.getPendingCached().removeElement(cblock);
        }
      }
      if (neededReplication >= numCached) {
        // If we have all the replicas we need, or too many, drop all
        // pending cached.
        for (DatanodeDescriptor datanode : pendingUncached) {
          datanode.getPendingUncached().removeElement(cblock);
        }
      }
{code}

{{CacheReplicationMonitor#rescanFile}}: Can you explain the logic around mark 
in this method?  I understand the mark logic in {{rescanCachedBlockMap}}, but I 
didn't follow it here.

{code}
        if (mark != ocblock.getMark()) {
          ocblock.setReplicationAndMark(pce.getReplication(), mark);
        } else {
          ocblock.setReplicationAndMark((short)Math.max(
              pce.getReplication(), ocblock.getReplication()), mark);
        }
{code}

{{NameNode}}: Is this HA change meant for this patch, or is it meant to be its 
own patch that can go to trunk?

Several tests are commented out in this version of the patch so that they 
aren't running.


> Automatically cache new data added to a cached path
> ---------------------------------------------------
>
>                 Key: HDFS-5096
>                 URL: https://issues.apache.org/jira/browse/HDFS-5096
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Andrew Wang
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5096-caching.005.patch, HDFS-5096-caching.006.patch
>
>
> For some applications, it's convenient to specify a path to cache, and have 
> HDFS automatically cache new data added to the path without sending a new 
> caching request or a manual refresh command.
> One example is new data appended to a cached file. It would be nice to 
> re-cache a block at the new appended length, and cache new blocks added to 
> the file.
> Another example is a cached Hive partition directory, where a user can drop 
> new files directly into the partition. It would be nice if these new files 
> were cached.
> In both cases, this automatic caching would happen after the file is closed, 
> i.e. block replica is finalized.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5096) Automatically cache new data added to a cached path

Reply via email to