[ 
https://issues.apache.org/jira/browse/HDFS-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590042#comment-14590042
 ] 

Allen Wittenauer commented on HDFS-8575:
----------------------------------------

bq. My point is to have a quota for the user irrespective of the folders. 

Nope, I understand that's the use case you are shooting for.  I'm just saying 
that in practice, it's generally used for the wrong reasons and not really 
needed.  Let's take your example:

{code}
/hive/user1/table1
/hive/user2/table1
/hbase/user1/Htable1
/hbase/user2/Htable1
{code}

>From a supportability perspective, this is kind of a mess.  A user's content 
>is now scattered all across the file system and dictates that every framework 
>will need to have a directory at the root level.  With just two this seems 
>reasonable, but in places where there are 5 or more different frameworks, this 
>gets nuts.... and never mind if you want to run (say) spark against your hbase 
>table...

Whereas...

{code}
/home/user1/hive/table1
/home/user1/hbase/Htable1
/home/user2/hive/table1
/home/user2/hbase/Htable1
{code}

...means all of that user's content is co-located.  When it comes time to 
archive that user because they've terminated employment or whatever, it's 
*extremely* easy.  Need to know much space that one user is consuming?  It's 
*extremely* easy.  Want to snapshot that user?  Piece of cake.  The list goes 
on and on.

Most pool-based storage systems (which Hadoop effectively falls under) 
generally recommend using dir-based and in some cases (e.g., ZFS), don't do 
user-based at all.  It's a non-trivial feature to support that usually 
outweighs any benefits, especially since having a user's files scattered all 
over a file system is such an operational anti-pattern.  

Federation makes this situation even worse.  Are users generally aware they are 
hopping from namespace to namespace? Probably not.  So they'll start sticking 
files in random places and/or just be generally confused why they can write in 
spot x but not in spot y.

There's also the problem of reporting.  Currently Hadoop's quota reporting 
facilities are "sub-par" to put it as nicely as a possibly can.  I suspect 
adding user-level quotas is going to make it much worse. 

> Support User level Quota for space and Name (count)
> ---------------------------------------------------
>
>                 Key: HDFS-8575
>                 URL: https://issues.apache.org/jira/browse/HDFS-8575
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: nijel
>            Assignee: nijel
>
> I would like to have one feature in HDFS to have quota management at user 
> level. 
> Background :
> When the customer uses a multi tenant solution it will have many Hadoop eco 
> system components like HIVE, HBASE, yarn etc. The base folder of these 
> components are different like /hive - Hive , /hbase -HBase. 
> Now if a user creates some file  or table these will be under the folder 
> specific to component. If the user name is taken into account it looks like
> {code}
>                                 /hive/user1/table1
>                                 /hive/user2/table1
>                                 /hbase/user1/Htable1
>                                 /hbase/user2/Htable1
>  
> Same for yarn/map-reduce data and logs
> {code}
>  
> In this case restricting the user to use a certain amount of disk/file is 
> very difficult since the current quota management is at folder level.
>  
> Requirement: User level Quota for space and Name (count). Say user1 can have 
> 100G irrespective of the folder or location used.
>  
> Here the idea to consider the file owner ad the key and attribute the quota 
> to it.  So the current quota system can have a initial check for the user 
> quota if defined, before validating the folder quota.
> Note:
> This need a change in fsimage to store the user and quota information
> Please have a look on this scenario. If it sounds good, i will create the 
> tasks and the update the design and prototype.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to