[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

Chris Nauroth (JIRA) Thu, 15 May 2014 13:09:07 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996616#comment-13996616
 ]


Chris Nauroth commented on HDFS-6377:
-------------------------------------

Thanks for changing the config property names.

bq. Do you think we need to have minimum configuration limit? Lets say user 
configured size as 3, then this is always invalid size as Namespace itself 
occupy this space? [ I am not insisting, just to discuss this point ]

Alternatively, the other fs-limits configs have the semantics that setting them 
to 0 disables enforcement.  I suppose this might be helpful as an escape hatch 
if something causes really unexpectedly long data, but the admin still wants to 
keep the service running.  (Like Uma, I'm just discussing ideas, not insisting.)

{code}
  private void checkXAttrSize(XAttr xAttr) throws UnsupportedEncodingException {
    int size = xAttr.getName().getBytes("UTF-8").length;
    if (xAttr.getValue() != null) {
      size += xAttr.getValue().length;
    }
    if (size > nnConf.xattrMaxSize) {
      throw new HadoopIllegalArgumentException(
          "XAttr is too big, maximum size = " + nnConf.xattrMaxSize
              + ", but the size is = " + xAttr.getName().length());
    }
  }
{code}

I believe the log message will be incorrect in the presence of multi-byte 
characters.  The limit is enforced on the number of bytes in UTF-8 encoding.  
The log message uses the string length, which can differ.  This could confuse 
users if we reject an xattr and then log a size that appears to be under the 
configured limit.  Here is a quick Scala REPL session demonstrating the problem:

{code}
scala> val s = "single-byte-chars"
val s = "single-byte-chars"
s: java.lang.String = single-byte-chars

scala> s.getBytes("UTF-8").length
s.getBytes("UTF-8").length
res2: Int = 17

scala> s.length
s.length
res3: Int = 17

scala> val s2 = "multi-byte-\u0641-chars"
val s2 = "multi-byte-\u0641-chars"
s2: java.lang.String = multi-byte-?-chars

scala> s2.getBytes("UTF-8").length
s2.getBytes("UTF-8").length
res4: Int = 19

scala> s2.length
s2.length
res5: Int = 18
{code}

Also, here is a minor code cleanup suggestion on the above.  Guava defines a 
constant {{Charsets#UTF_8}}.  We can pass this to {{String#getBytes(Charset)}} 
(not using the overload that takes a {{String}} parameter).  Then, that 
eliminates the need to deal with {{UnsupportedEncodingException}}.  I've always 
found that exception irritating.  Of course we have UTF-8!  :-)


For {{dfs.namenode.fs-limits.max-directory-items}}, we log an error message if 
we encounter an existing inode that violates the limit during startup/applying 
edits.  This can be a helpful message if an admin down-tunes the setting and 
then wants to identify and clean up existing data that's in violation.  Can we 
log a message for the xattr limit violations too?  If it's easier, feel free to 
punt this part to a separate jira.  (I realize you're close to +1 on this patch 
already.)

> Unify xattr name and value limits into a single limit
> -----------------------------------------------------
>
>                 Key: HDFS-6377
>                 URL: https://issues.apache.org/jira/browse/HDFS-6377
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS XAttrs (HDFS-2006)
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

Reply via email to