[jira] [Commented] (HDFS-5431) support cachepool-based quota management in path-based caching

Colin Patrick McCabe (JIRA) Fri, 06 Dec 2013 11:59:51 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841628#comment-13841628
 ]


Colin Patrick McCabe commented on HDFS-5431:
--------------------------------------------

{code}
    if (in.readBoolean()) {
      info.setOwnerName(Text.readString(in));
    }
    if (in.readBoolean())  {
      info.setGroupName(Text.readString(in));
    }
    if (in.readBoolean()) {
      info.setMode(FsPermission.read(in));
    }
    if (in.readBoolean()) {
      info.setReservation(in.readLong());
    }
    if (in.readBoolean()) {
      info.setQuota(in.readLong());
    }
    if (in.readBoolean()) {
      info.setWeight(in.readInt());
    }
{code}

I don't think the backwards-compatibility stuff here is really going to work.  
The problem is, if we add more booleans, the old code won't know they're there, 
and will ignore them.  Then we will interpret those bytes as something else, 
which could cause some really bad results.

I think the best way to do this is to start with a 32-bit word, which we can 
treat as a bitfield.  We can then load or not load field N according to whether 
bit N is set.  If there are bits set that we don't know how to interpret, we 
can bail out with a nice error message rather than trying to loading garbage 
and possibly corrupting the fsimage.  We probably should use this approach for 
cache directives as well.

{code}
        int mode = Integer.parseInt(modeString, 8);
        info.setMode(new FsPermission((short)mode));
{code}
hey, there's a {{Short.parseShort}} too :)

About terminology: isn't "maximum" a better name for what we're implementing 
here than "quota"?  If we implement something more sophisticated later, it 
could get confusing if we just use the term "quota" here.  I also think we 
should rip out weight completely if we're not going to support it any more.  I 
see a few places where "weight" is lingering now.  The feature flag stuff 
should allow us to add it forwards-compatibly (although not 
backwards-compatibly) in the future, if we want to.  I feel the same way about 
"reservation."

I'm not sure that we want a cache directive addition to fail when the maximum 
has been exceeded.  The problem is, there isn't any good way to implement this 
kind of simple check for more sophisticated quota methods like fair share or 
minimum share, etc.  Also, this is dependent on things like what we think the 
sizes are of files and directories in the cluster, which may change.  The 
result is very inconsistent behavior from the user's point of view.  For 
example, maybe he can add cache directives if a datanode has not come up, but 
can't add them once it comes up and we determine the full size of a certain 
file.  Or maybe he could add them by manually editing the edit log, but not 
from the command-line.  It just feels inconsistent.  I would rather we teach 
people to rely on looking at {{bytesNeeded}} versus {{bytesCached}} to 
determine if they had enough space.

I wonder if we should add another metric that somehow allows users to 
disambiguate between bytes not cached because of maximums / quotas / other 
"executive decision" and bytes not cached because the DN had an issue.  Right 
now all the user can do is subtract bytesNeeded from bytesCached and see that 
there is some gap, but he would have to check the logs to know why.

> support cachepool-based quota management in path-based caching
> --------------------------------------------------------------
>
>                 Key: HDFS-5431
>                 URL: https://issues.apache.org/jira/browse/HDFS-5431
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: hdfs-5431-1.patch
>
>
> We should support cachepool-based quota management in path-based caching.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5431) support cachepool-based quota management in path-based caching

Reply via email to