[
https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976941#comment-13976941
]
Yi Liu commented on HDFS-2006:
------------------------------
[~cnauroth], thanks for so many valuable comments.
{quote}
Is there an intention to integrate with the ACLs work done in HDFS-4685? For
example, on Linux, the getfattr command can present POSIX ACLs as xattrs in the
system namespace. I haven't fully explored use of getfattr/setfattr in
combination with ACLs, so I don't know the details of how this works on Linux
yet. The most important piece of this would be to guarantee that setfattr
cannot circumvent any important security logic enforced by setfacl, and write
tests for it.
{quote}
This is good point. We will think to integrate later once this work is in?
{quote}
Can you elaborate on this? This sounded like using a bit somewhere as a flag to
indicate presence of xattrs, but I didn't spot anything like that in my initial
scan of the patch. I just saw the new inode feature, which was basically
implemented as I would have expected.
{quote}
This is same like if #hasAcl()
{quote}
Can you please clarify this part? Is the intention that resolution of the xattr
reference is somehow pluggable inside the NN, or is this just talking about a
general strategy of using an xattr to persist a foreign key to some external
system, with the client then driving lookup at that external system?
{quote}
Right now, all the Xattrs will be stored in NN memory only. Later we may
support to store the Xattrs large values in extrenal system to avoid NN memory,
it’s a general strategy of using an xattr to persist a foreign key to some
external system, But we are not focusing it in the current version
implementation. We will revise the doc to give clear understanding on this.
{quote}
Could you provide more details on the memory calculation? For example, what
numbers are assumed for current inode size, average number of xattrs per inode,
and average size of name and value for the individual xattrs?
{quote}
Next revision will cover this.
{quote}
I was confused by the getXAttrs API that accepts List<Xattr>. Looking at the
implementation in FSNamesystem#getXAttrs, it appears that the get operation
also may optionally mutate the xattrs. Do I understand correctly? If so, can we
remove this and make the get operation focus solely on returning the existing
xattrs?
{quote}
Agree with you, APIs are getting refined in HADOOP-10520.
{quote}
Right now, if a caller attempts to set multiple xattrs, then the authorized
ones are accepted, and the unauthorized ones get filtered out and ignored
silently. This could be problematic for error handling scenarios. For example,
if we imagine a member of the super-group modifying xattrs in the system
namespace successfully, and then that user gets removed from the super-group,
then their calls will start failing to apply the xattrs silently, but no error
will be propagated back to the caller to inform them that something is broken
now. Perhaps this scenario should throw AccessControlException instead.
{quote}
Agree with you. We have refined the Interfaces from HADOOP-10520. Mostly we
plan to support setting/removing single attr as first version. It’s more closer
to POSIX APIs and can cover most of use cases, and the handling is simplified.
{quote}
Do you intend to enforce any conservative upper limits to try to mitigate
against someone maliciously or accidentally bloating the NN memory footprint by
writing huge data into xattrs? Are there upper limits on number of xattrs on a
single inode, number of bytes in an xattr name and number of bytes in an xattr
value?
{quote}
Actually we planned it in first revision of doc. After that again thought that
the same malicious user can create more and more useless files and make NN
memory filled. I think, better we will have this control and no hrm on having
that.
Your point of accidental bloating is good one. Even through user is not a
malicious user but accidentally wrote huge data as part of Xattr and forgot to
remove that. This make NN to suffer on memory usage.
We will update in next version of doc.
{quote}
The current patch appears to include xattrs in the method signatures for the
internal mkdir methods, but I didn't notice a need for this. Was this modeled
after the code changes for ACLs? For POSIX ACLs, we needed to support the
notion of a default ACL, which is an ACL copied from the parent directory to
the child sub-directory or file at time of creation. In the implementation,
this required passing along the list of ACL entries to copy in some of the
internal methods. I'm not aware of general support for this kind of behavior
across all xattrs, so if it doesn't exist and there is no use case for it, then
I recommend removing this part of the code.
{quote}
Yes. We don't really need this part. We will remove while implementing in
subtask JIRA.
{quote}
The design states a desire to preserve xattrs using distcp, but it also states
that WebHDFS is not yet in scope. WebHDFS is very commonly used as a distcp
target though, so I wonder if we should reconsider.
{quote}
WebHDFS is in scope, just not implemented. We will update this. It is actually
on next revision.
{quote}
I assume that you will not support getting and setting xattrs via NFS mount.
(i.e. We won't implement the case of someone calling Linux getattr/setattr on a
path that is really under an NFS mount point to an HDFS cluster.) That's the
choice we made for initial development of getfacl/setfacl. If this is the plan
for xattrs too, then I just recommend stating that in the design doc.
{quote}
We will update in the doc revision about this explicitly.
We will cover all your design related comments in next revision of doc. Code
related comments will handled in subtasks. Thanks again for the Review.
> ability to support storing extended attributes per file
> -------------------------------------------------------
>
> Key: HDFS-2006
> URL: https://issues.apache.org/jira/browse/HDFS-2006
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: HDFS XAttrs (HDFS-2006)
> Reporter: dhruba borthakur
> Assignee: Yi Liu
> Attachments: HDFS-XAttrs-Design-1.pdf, xattrs.1.patch, xattrs.patch
>
>
> It would be nice if HDFS provides a feature to store extended attributes for
> files, similar to the one described here:
> http://en.wikipedia.org/wiki/Extended_file_attributes.
> The challenge is that it has to be done in such a way that a site not using
> this feature does not waste precious memory resources in the namenode.
--
This message was sent by Atlassian JIRA
(v6.2#6252)