[ https://issues.apache.org/jira/browse/HADOOP-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642048#action_12642048 ]
Doug Cutting commented on HADOOP-4348: -------------------------------------- Sanjay> The best way to represent that service access is when a service proxy object is created - e.g when the connection is established. A proxy is not bound to a single connection. Connections are retrieved from a cache each time a call is made. Different proxies may share the same connection, and a single proxy my use different connections for different calls. Sanjay> We could share multiple service sessions in a single connection but that complexity is not worth it. It would be simpler to implement this way, not more complex. In HADOOP-4049 it was considerably simpler to pass extra data by modifying the RPC code than Client/Server. That's my primary motivation here: to keep the code simple. So unless there's a reason why we must authorize per connection rather than per request, it would be easier to authorize requests and would better compartmentalize the code. There are some performance implications. Authorizing per request will use fewer connections but perform more authorizations. I don't know whether this is significant. I expect that ACLs will be cached, and that authorization will not be too expensive, but that remains to be seen. So performance may provide a motivation to authorize per connection. But let's not prematurely optimize. Sanjay> I see your argument to be equivalent to arguing against service level authorization and that method level authorization is sufficient. No, but we will eventually probably need method-level authorization too, and it would be nice if whatever support we add now also helps then. If we do this in RPC, then we can examine only the protocol name for now, and subsequently add method-level authorization at the same place. So implementing service-level-authentication this way better prepares us for method-level authentication. Sanjay> Would you be happier if we created an intermediate layer, say rpc-session, in between. I am not seriously suggesting we do that. We have two layers today. We could add this at either layer. It would be cleaner to add it only at one layer, not mixed between the two, as in the current patch. It would be simpler to add it to the RPC layer, and I have yet to hear a strong reason why that would be wrong. That's all I'm saying. > Adding service-level authorization to Hadoop > -------------------------------------------- > > Key: HADOOP-4348 > URL: https://issues.apache.org/jira/browse/HADOOP-4348 > Project: Hadoop Core > Issue Type: New Feature > Reporter: Kan Zhang > Assignee: Arun C Murthy > Fix For: 0.20.0 > > Attachments: HADOOP-4348_0_20081022.patch > > > Service-level authorization is the initial checking done by a Hadoop service > to find out if a connecting client is a pre-defined user of that service. If > not, the connection or service request will be declined. This feature allows > services to limit access to a clearly defined group of users. For example, > service-level authorization allows "world-readable" files on a HDFS cluster > to be readable only by the pre-defined users of that cluster, not by anyone > who can connect to the cluster. It also allows a M/R cluster to define its > group of users so that only those users can submit jobs to it. > Here is an initial list of requirements I came up with. > 1. Users of a cluster is defined by a flat list of usernames and groups. > A client is a user of the cluster if and only if her username is listed in > the flat list or one of her groups is explicitly listed in the flat list. > Nested groups are not supported. > 2. The flat list is stored in a conf file and pushed to every cluster > node so that services can access them. > 3. Services will monitor the modification of the conf file periodically > (5 mins interval by default) and reload the list if needed. > 4. Checking against the flat list is done as early as possible and before > any other authorization checking. Both HDFS and M/R clusters will implement > this feature. > 5. This feature can be switched off and is off by default. > I'm aware of interests in pulling user data from LDAP. For this JIRA, I > suggest we implement it using a conf file. Additional data sources may be > supported via new JIRA's. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.