[ 
https://issues.apache.org/jira/browse/HDFS-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861287#comment-15861287
 ] 

Hari Sekhon commented on HDFS-11400:
------------------------------------

[~aw] Good question. Where are such fake users coming from? Given NN resolves 
users from OS / Kerberos, this would mean the OS / Kerberos systems have 
already been compromised to have had fake users added?

Putting a configurable user/group filter to only automatically create home 
directories for a whitelisted regex of users/groups could form a layer of 
protection. For example in a cluster integrated with Active Directory which 
might have 20,000 users you may only want 100 of those users actually using the 
Hadoop cluster. Although in practice this filtering is usually already done at 
the OS level via SSSD etc.

Another layer of protection could be a setting on max enumerated users for 
which home directories were going to be automatically created or max number of 
home directories already in existence - if the enumerated users or the number 
of existing home directories is too high, eg. 1000 then log it and disable 
auto-creation until resolved to prevent said memory explosion. Really the 
second idea on number of home directories in existence before disabling auto 
home directory creation would be better as it shouldn't really be enumerating 
users but rather creating the home directory on the fly each time a single new 
user is first used on the cluster and no home directory exists for the user.

How about these ideas?

This would stop various jobs from breaking where they try to put staging files 
etc in home directories that don't exist because they haven't been manually 
created yet or scripted (it seems silly in retrospect for admins to keep 
writing scripts to do this for every client when this could be solved once and 
for all via NN logic).

> Automatic HDFS Home Directory Creation
> --------------------------------------
>
>                 Key: HDFS-11400
>                 URL: https://issues.apache.org/jira/browse/HDFS-11400
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, namenode
>    Affects Versions: 2.7.1
>         Environment: HDP 2.4.2
>            Reporter: Hari Sekhon
>
> Feature Request to add automatic home directory creation for HDFS users when 
> they are first resolved by the NameNode if their home directory does not 
> already exist, using configurable umask defaulting to 027.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to