magnuma3 opened a new pull request, #8514:
URL: https://github.com/apache/hadoop/pull/8514

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   HDFS-17926
   
   Currently, HDFS does not automatically create a user's home directory (e.g., 
`/user/<user>`). This requires administrators to manually create home 
directories, which adds operational overhead and can cause failures for 
user-facing tools (e.g., MapReduce job submission, Hive, Spark) that assume the 
home directory exists.
   
   This JIRA tracks the development of automatic home directory creation so 
that when a user's home directory does not yet exist, HDFS creates it 
automatically with appropriate ownership (`<username>:<supergroup>`) and 
permissions (`drwx------`).
   
   **Motivation**:
      - Reduces administrative burden for large multi-tenant clusters
      - Prevents job failures caused by missing home directories
      - Aligns with the behavior expected by higher-level ecosystem tools
    
   
   **Behavior**
     
   When dfs.namenode.auto.create.user.home.enabled=true, the NN intercepts
   the following RPCs and creates the caller's /user/<short-name> if it does
   not yet exist:
     
       create, mkdirs, getListing, getFileInfo, getLocatedFileInfo
   
   `hdfs dfs -ls` and similar commands issue getFileInfo first, so a user's
   home directory is created the first time they touch the cluster.
   
   The directory is created with:
      - owner: caller's short username
      - group: caller's primary group (or a configured group if set)
      - permission: configured octal (default 0700)
      - quota: matched against group/user rules from the new quota config
   
   Creation is performed by the NN superuser (not the requesting user)
   
   Results (success and most failures) are cached by short username so that
   subsequent RPCs of the same user incur only a HashMap lookup (~0.001ms)
   instead of an NN getFileInfo round trip (~0.1ms). 
   
   ### How was this patch tested?
   
   This feature was originally developed and added to an internal fork of 
Apache Hadoop 3.1.2, and has been running in production for over a year.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   ### AI Tooling
   
   If an AI tool was used:
   
   - [ ] The PR includes the phrase "Contains content generated by <tool>"
         where <tool> is the name of the AI tool used.
   - [ ] My use of AI contributions follows the ASF legal policy
         https://www.apache.org/legal/generative-tooling.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to