Author: rangadi
Date: Tue Feb 19 13:14:44 2008
New Revision: 629234

URL: http://svn.apache.org/viewvc?rev=629234&view=rev
Log:
HADOOP-2371. User guide for file permissions in HDFS. (Robert Chansler via 
rangadi)

Added:
    
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
Modified:
    hadoop/core/trunk/CHANGES.txt
    
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml
    hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/core/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=629234&r1=629233&r2=629234&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Tue Feb 19 13:14:44 2008
@@ -58,6 +58,11 @@
 
 Release 0.16.1 - Unrelease
 
+  IMPROVEMENTS
+
+    HADOOP-2371. User guide for file permissions in HDFS.
+    (Robert Chansler via rangadi)
+    
   BUG FIXES
 
     HADOOP-2789. Race condition in IPC Server Responder that could close

Added: 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
URL: 
http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml?rev=629234&view=auto
==============================================================================
--- 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
 (added)
+++ 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
 Tue Feb 19 13:14:44 2008
@@ -0,0 +1,191 @@
+<?xml version="1.0"?>
+<!--
+  Copyright 2008 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
+          "http://forrest.apache.org/dtd/document-v20.dtd";>
+
+
+<document>
+
+  <header>
+    <title>
+      Permissions User and Administrator Guide
+    </title>
+  </header>
+
+  <body>
+    <section> <title>Overview</title>
+      <p>
+               The Hadoop Distributed File System implements a permissions 
model for files and directories that shares much of the POSIX model. Each file 
and directory is associated with an <em>owner</em> and a <em>group</em>. The 
file or directory has separate permissions for the user that is the owner, for 
other users that are members of the group, and for all other users. For files, 
the <em>r</em> permission is required to read the file, and the <em>w</em> 
permission is required to write or append to the file. For directories, the 
<em>r</em> permission is required to list the contents of the directory, the 
<em>w</em> permission is required to create or delete files or directories, and 
the <em>x</em> permission is required to access a child of the directory. In 
contrast to the POSIX model, there are no <em>sticky</em>, <em>setuid</em> or 
<em>setgid</em> bits for files as there is no notion of executable files. For 
directories, there no <em>sticky</em>, <em>setuid</em> or <em>setgid</em
 > bits directory as a simplification. Collectively, the permissions of a file 
 > or directory are its <em>mode</em>. In general, Unix customs for 
 > representing and displaying modes will be used, including the use of octal 
 > numbers in this description. When a file or directory is created, its owner 
 > is the user identity of the client process, and its group is the group of 
 > the parent directory (the BSD rule).
+       </p>
+       <p>
+               Each client process that accesses HDFS has a two-part identity 
composed of the <em>user name</em>, and <em>groups list</em>. Whenever HDFS 
must do a permissions check for a file or directory <code>foo</code> accessed 
by a client process,
+       </p>
+       <ul>
+               <li>
+                  If the user name matches the owner of <code>foo</code>, then 
the owner permissions are tested;
+               </li>
+               <li>
+                  Else if the group of <code>foo</code> matches any of member 
of the groups list, then the group permissions are tested;
+               </li>
+               <li>
+                  Otherwise the the other permissions of <code>foo</code> are 
tested.
+               </li>
+       </ul>
+
+<p>
+               If a permissions check fails, the the client operation fails.   
+</p>
+     </section>
+
+<section><title>User Identity</title>
+<p>
+In this release of Hadoop the identity of a client process is just whatever 
the host operating system says it is. For Unix-like systems,
+</p>
+<ul>
+<li>
+   The user name is the equivalent of <code>`whoami`</code>;
+</li>
+<li>
+   The group list is the equivalent of <code>`bash -c groups`</code>.
+</li>
+</ul>
+
+<p>
+In the future there will be other ways of establishing user identity (think 
Kerberos, LDAP, and others). There is no expectation that this first method is 
secure in protecting one user from impersonating another. This user identity 
mechanism combined with the permissions model allows a cooperative community to 
share file system resources in an organized fashion.
+</p>
+<p>
+In any case, the user identity mechanism is extrinsic to HDFS itself. There is 
no provision within HDFS for creating user identities, establishing groups, or 
processing user credentials.
+</p>
+</section>
+
+<section> <title>Understanding the Implementation</title>
+<p>
+Each file or directory operation passes the full path name to the name node, 
and the permissions checks are applied along the path for each operation. The 
client framework will implicitly associate the user identity with the 
connection to the name node, reducing the need for changes to the existing 
client API. It has always been the case that when one operation on a file 
succeeds, the operation might fail when repeated because the file, or some 
directory on the path, no longer exists. For instance, when the client first 
begins reading a file, it makes a first request to the name node to discover 
the location of the first blocks of the file. A second request made to find 
additional blocks may fail. On the other hand, deleting a file does not revoke 
access by a client that already knows the blocks of the file. With the addition 
of permissions, a client's access to a file may be withdrawn between requests. 
Again, changing permissions does not revoke the access of a client that 
 already knows the file's blocks.
+</p>
+<p>
+The map-reduce framework delegates the user identity by passing strings 
without special concern for confidentiality. The owner and group of a file or 
directory are stored as strings; there is no conversion from user and group 
identity numbers as is conventional in Unix.
+</p>
+<p>
+The permissions features of this release did not require any changes to the 
behavior of data nodes. Blocks on the data nodes do not have any of the 
<em>Hadoop</em> ownership or permissions attributes associated with them.
+</p>
+</section>
+     
+<section> <title>Changes to the File System API</title>
+<p>
+       All methods that use a path parameter will throw 
<code>AccessControlException</code> if permission checking fails.
+</p>
+<p>New methods:</p>
+<ul>
+       <li>
+               <code>public FSDataOutputStream create(Path f, FsPermission 
permission, boolean overwrite, int bufferSize, short replication, long 
blockSize, Progressable progress) throws IOException;</code>
+       </li>
+       <li>
+               <code>public boolean mkdirs(Path f, FsPermission permission) 
throws IOException;</code>
+       </li>
+       <li>
+               <code>public void setPermission(Path p, FsPermission 
permission) throws IOException;</code>
+       </li>
+       <li>
+               <code>public void setOwner(Path p, String username, String 
groupname) throws IOException;</code>
+       </li>
+       <li>
+               <code>public FileStatus getFileStatus(Path f) throws 
IOException;</code> will additionally return the user, group and mode 
associated with the path.
+       </li>
+
+</ul>
+<p>
+The mode of a new file or directory is restricted my the <code>umask</code> 
set as a configuration parameter. When the existing <code>create(path, 
&hellip;)</code> method (<em>without</em> the permission parameter) is used, 
the mode of the new file is <code>666&thinsp;&amp;&thinsp;^umask</code>. When 
the new <code>create(path, </code><em>permission</em><code>, &hellip;)</code> 
method (<em>with</em> the permission parameter <em>P</em>) is used, the mode of 
the new file is 
<code>P&thinsp;&amp;&thinsp;^umask&thinsp;&amp;&thinsp;666</code>. When a new 
directory is created with the existing <code>mkdirs(path)</code> method 
(<em>without</em> the permission parameter), the mode of the new directory is 
<code>777&thinsp;&amp;&thinsp;^umask</code>. When the new <code>mkdirs(path, 
</code><em>permission</em> <code>)</code> method (<em>with</em> the permission 
parameter <em>P</em>) is used, the mode of new directory is 
<code>P&thinsp;&amp;&thinsp;^umask&thinsp;&amp;&thinsp;777</code>. 
+</p>
+</section>
+
+     
+<section> <title>Changes to the Application Shell</title>
+<p>New operations:</p>
+<dl>
+       <dt><code>chmod [-R]</code> <em>mode file &hellip;</em></dt>
+       <dd>
+               Only the owner of a file or the super-user is permitted to 
change the mode of a file.
+       </dd>
+       <dt><code>chgrp [-R]</code> <em>group file &hellip;</em></dt>
+       <dd>
+               The user invoking <code>chgrp</code> must belong to the 
specified group and be the owner of the file, or be the super-user.
+       </dd>
+       <dt><code>chown [-R]</code> <em>[owner][:[group]] file 
&hellip;</em></dt>
+       <dd>
+               The owner of a file may only be altered by a super-user.
+       </dd>
+       <dt><code>ls </code> <em>file &hellip;</em></dt><dd></dd>
+       <dt><code>lsr </code> <em>file &hellip;</em></dt>
+       <dd>
+               The output is reformatted to display the owner, group and mode.
+       </dd>
+</dl></section>
+
+     
+<section> <title>The Super-User</title>
+<p>
+       The super-user is the user with the same identity as name node process 
itself. Loosely, if you started the name node, then you are the super-user. The 
super-user can do anything in that permissions checks never fail for the 
super-user. There is no persistent notion of who <em>was</em> the super-user; 
when the name node is started the process identity determines who is the 
super-user <em>for now</em>. The HDFS super-user does not have to be the 
super-user of the name node host, nor is it necessary that all clusters have 
the same super-user. Also, an experimenter running HDFS on a personal 
workstation, conveniently becomes that installation's super-user without any 
configuration.
+       </p>
+       <p>
+       In addition, the administrator my identify a distinguished group using 
a configuration parameter. If set, members of this group are also super-users.
+</p>
+</section>
+
+<section> <title>The Web Server</title>
+<p>
+The identity of the web server is a configuration parameter. That is, the name 
node has no notion of the identity of the <em>real</em> user, but the web 
server behaves as if it has the identity (user and groups) of a user chosen by 
the administrator. Unless the chosen identity matches the super-user, parts of 
the name space may be invisible to the web server.</p>
+</section>
+
+<section> <title>On-line Upgrade</title>
+<p>
+If a cluster starts with a version 0.15 data set (<code>fsimage</code>), all 
files and directories will have owner <em>O</em>, group <em>G</em>, and mode 
<em>M</em>, where <em>O</em> and <em>G</em> are the user and group identity of 
the super-user, and <em>M</em> is a configuration parameter. </p>
+</section>
+
+<section> <title>Configuration Parameters</title>
+<dl>
+       <dt><code>dfs.permissions = true </code></dt>
+       <dd>
+               If <code>yes</code> use the permissions system as described 
here. If <code>no</code>, permission <em>checking</em> is turned off, but all 
other behavior is unchanged. Switching from one parameter value to the other 
does not change the mode, owner or group of files or directories.
+               <p>
+               </p>
+               Regardless of whether permissions are on or off, 
<code>chmod</code>, <code>chgrp</code> and <code>chown</code> <em>always</em> 
check permissions. These functions are only useful in the permissions context, 
and so there is no backwards compatibility issue. Furthermore, this allows 
administrators to reliably set owners and permissions in advance of turning on 
regular permissions checking.
+       </dd>
+       <dt><code>dfs.web.ugi = webuser,webgroup</code></dt>
+       <dd>
+               The user name to be used by the web server. Setting this to the 
name of the super-user allows any web client to see everything. Changing this 
to an otherwise unused identity allows web clients to see only those things 
visible using "other" permissions. Additional groups may be added to the 
comma-separated list.
+       </dd>
+       <dt><code>dfs.permissions.supergroup = supergroup</code></dt>
+       <dd>
+               The name of the group of super-users.
+       </dd>
+       <dt><code>dfs.upgrade.permission = 777</code></dt>
+       <dd>
+               The choice of initial mode during upgrade. The <em>x</em> 
permission is <em>never</em> set for files. For configuration files, the 
decimal value <em>511<sub>10</sub></em> may be used.
+       </dd>
+       <dt><code>dfs.umask = 022</code></dt>
+       <dd>
+               The <code>umask</code> used when creating files and 
directories. For configuration files, the decimal value 
<em>18<sub>10</sub></em> may be used.
+       </dd>
+</dl>
+</section>
+
+     
+  </body>
+</document>
+       
+

Modified: 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml
URL: 
http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml?rev=629234&r1=629233&r2=629234&view=diff
==============================================================================
--- 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml 
(original)
+++ 
hadoop/core/trunk/src/docs/src/documentation/content/xdocs/hdfs_user_guide.xml 
Tue Feb 19 13:14:44 2008
@@ -358,7 +358,8 @@
       to simple file permissions. The user that starts Namenode is
       treated as the <em>super user</em> for HDFS. Future versions of HDFS will
       support network authentication protocols like Kerberos for user
-      authentication and encryption of data transfers.
+      authentication and encryption of data transfers. The details are 
discussed in the 
+      <a href="hdfs_permissions_guide.html"><em>Permissions User and 
Administrator Guide</em></a>.
      </p>
      
    </section> <section> <title> Scalability </title>
@@ -415,5 +416,5 @@
      
   </body>
 </document>
-       
-
+       
+

Modified: hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: 
http://svn.apache.org/viewvc/hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=629234&r1=629233&r2=629234&view=diff
==============================================================================
--- hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml 
(original)
+++ hadoop/core/trunk/src/docs/src/documentation/content/xdocs/site.xml Tue Feb 
19 13:14:44 2008
@@ -37,6 +37,7 @@
     <setup     label="Cluster Setup"      href="cluster_setup.html" />
     <hdfs      label="HDFS Architecture"  href="hdfs_design.html" />
     <hdfs      label="HDFS User Guide"    href="hdfs_user_guide.html" />
+    <hdfs      label="HDFS Permissions Guide"    
href="hdfs_permissions_guide.html" />
     <mapred    label="Map-Reduce Tutorial" href="mapred_tutorial.html" />
     <mapred    label="Native Hadoop Libraries" href="native_libraries.html" />
     <streaming label="Streaming"          href="streaming.html" />


Reply via email to