[
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996616#comment-13996616
]
Chris Nauroth commented on HDFS-6377:
-------------------------------------
Thanks for changing the config property names.
bq. Do you think we need to have minimum configuration limit? Lets say user
configured size as 3, then this is always invalid size as Namespace itself
occupy this space? [ I am not insisting, just to discuss this point ]
Alternatively, the other fs-limits configs have the semantics that setting them
to 0 disables enforcement. I suppose this might be helpful as an escape hatch
if something causes really unexpectedly long data, but the admin still wants to
keep the service running. (Like Uma, I'm just discussing ideas, not insisting.)
{code}
private void checkXAttrSize(XAttr xAttr) throws UnsupportedEncodingException {
int size = xAttr.getName().getBytes("UTF-8").length;
if (xAttr.getValue() != null) {
size += xAttr.getValue().length;
}
if (size > nnConf.xattrMaxSize) {
throw new HadoopIllegalArgumentException(
"XAttr is too big, maximum size = " + nnConf.xattrMaxSize
+ ", but the size is = " + xAttr.getName().length());
}
}
{code}
I believe the log message will be incorrect in the presence of multi-byte
characters. The limit is enforced on the number of bytes in UTF-8 encoding.
The log message uses the string length, which can differ. This could confuse
users if we reject an xattr and then log a size that appears to be under the
configured limit. Here is a quick Scala REPL session demonstrating the problem:
{code}
scala> val s = "single-byte-chars"
val s = "single-byte-chars"
s: java.lang.String = single-byte-chars
scala> s.getBytes("UTF-8").length
s.getBytes("UTF-8").length
res2: Int = 17
scala> s.length
s.length
res3: Int = 17
scala> val s2 = "multi-byte-\u0641-chars"
val s2 = "multi-byte-\u0641-chars"
s2: java.lang.String = multi-byte-?-chars
scala> s2.getBytes("UTF-8").length
s2.getBytes("UTF-8").length
res4: Int = 19
scala> s2.length
s2.length
res5: Int = 18
{code}
Also, here is a minor code cleanup suggestion on the above. Guava defines a
constant {{Charsets#UTF_8}}. We can pass this to {{String#getBytes(Charset)}}
(not using the overload that takes a {{String}} parameter). Then, that
eliminates the need to deal with {{UnsupportedEncodingException}}. I've always
found that exception irritating. Of course we have UTF-8! :-)
For {{dfs.namenode.fs-limits.max-directory-items}}, we log an error message if
we encounter an existing inode that violates the limit during startup/applying
edits. This can be a helpful message if an admin down-tunes the setting and
then wants to identify and clean up existing data that's in violation. Can we
log a message for the xattr limit violations too? If it's easier, feel free to
punt this part to a separate jira. (I realize you're close to +1 on this patch
already.)
> Unify xattr name and value limits into a single limit
> -----------------------------------------------------
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: HDFS XAttrs (HDFS-2006)
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an
> xattr's name and value, let's use a single limit.
--
This message was sent by Atlassian JIRA
(v6.2#6252)