[
https://issues.apache.org/jira/browse/HDFS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590042#comment-14590042
]
Allen Wittenauer commented on HDFS-8575:
----------------------------------------
bq. My point is to have a quota for the user irrespective of the folders.
Nope, I understand that's the use case you are shooting for. I'm just saying
that in practice, it's generally used for the wrong reasons and not really
needed. Let's take your example:
{code}
/hive/user1/table1
/hive/user2/table1
/hbase/user1/Htable1
/hbase/user2/Htable1
{code}
>From a supportability perspective, this is kind of a mess. A user's content
>is now scattered all across the file system and dictates that every framework
>will need to have a directory at the root level. With just two this seems
>reasonable, but in places where there are 5 or more different frameworks, this
>gets nuts.... and never mind if you want to run (say) spark against your hbase
>table...
Whereas...
{code}
/home/user1/hive/table1
/home/user1/hbase/Htable1
/home/user2/hive/table1
/home/user2/hbase/Htable1
{code}
...means all of that user's content is co-located. When it comes time to
archive that user because they've terminated employment or whatever, it's
*extremely* easy. Need to know much space that one user is consuming? It's
*extremely* easy. Want to snapshot that user? Piece of cake. The list goes
on and on.
Most pool-based storage systems (which Hadoop effectively falls under)
generally recommend using dir-based and in some cases (e.g., ZFS), don't do
user-based at all. It's a non-trivial feature to support that usually
outweighs any benefits, especially since having a user's files scattered all
over a file system is such an operational anti-pattern.
Federation makes this situation even worse. Are users generally aware they are
hopping from namespace to namespace? Probably not. So they'll start sticking
files in random places and/or just be generally confused why they can write in
spot x but not in spot y.
There's also the problem of reporting. Currently Hadoop's quota reporting
facilities are "sub-par" to put it as nicely as a possibly can. I suspect
adding user-level quotas is going to make it much worse.
> Support User level Quota for space and Name (count)
> ---------------------------------------------------
>
> Key: HDFS-8575
> URL: https://issues.apache.org/jira/browse/HDFS-8575
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: nijel
> Assignee: nijel
>
> I would like to have one feature in HDFS to have quota management at user
> level.
> Background :
> When the customer uses a multi tenant solution it will have many Hadoop eco
> system components like HIVE, HBASE, yarn etc. The base folder of these
> components are different like /hive - Hive , /hbase -HBase.
> Now if a user creates some file or table these will be under the folder
> specific to component. If the user name is taken into account it looks like
> {code}
> /hive/user1/table1
> /hive/user2/table1
> /hbase/user1/Htable1
> /hbase/user2/Htable1
>
> Same for yarn/map-reduce data and logs
> {code}
>
> In this case restricting the user to use a certain amount of disk/file is
> very difficult since the current quota management is at folder level.
>
> Requirement: User level Quota for space and Name (count). Say user1 can have
> 100G irrespective of the folder or location used.
>
> Here the idea to consider the file owner ad the key and attribute the quota
> to it. So the current quota system can have a initial check for the user
> quota if defined, before validating the folder quota.
> Note:
> This need a change in fsimage to store the user and quota information
> Please have a look on this scenario. If it sounds good, i will create the
> tasks and the update the design and prototype.
> Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)