[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125274#comment-13125274
 ] 

Luke Lu commented on MAPREDUCE-2858:
------------------------------------

bq.  I feel I have made my objections clear in previous comments and in 
comments on MAPREDUCE-2863 and I will leave it at that.

Perhaps I missed something, but MAPREDUCE-2863 has nothing to do with security. 
And your objections on HADOOP-7532 seems to indicate a lack of understanding of 
the related threat model (see below comments as well).

bq. Is the proxy going to try and rewrite URLs so that they always pass through 
the proxy or is it simply going to rely on the application master to only 
output relative URLs?

Proxy should only allow a (configurable) whitelist of hosts to be used in 
absolute URLs.

bq. How is the RM going to generate the AM URL from the URL that the AM returns?

:am_uri is the original AM URI sans the scheme. e.g., original URI could be 
http://am1:am1port/ and RM generates 
http://appproxy:approxyport/proxy/am1:am1port/ for links.

bq. How is the proxy going to pass the user name to the Application Master?

See #3 and #4. Specifically via the cookie or params in the request created by 
the proxy. The param name could be something like proxy.user.name. We need a 
trusted ip white list (especially when the proxy use a vip for HA)

bq. Is there any plans for VIP on the proxies for failover?

The proxy is http and stateless by design so it can use any standard load 
balancer solution for HA. 

bq. The white listing based off of crypto signatures seems very confusing to 
me, possibly slow/memory intensive, and very much not user friendly.

The filtering is only needed if the request user is different the owner of AM, 
which is a minor use case (I guess < 10% of the requests), like when an admin 
is looking through RM consoles or a bug ticket. I suspect the confusion is from 
the lack of understanding of the thread model, which I'll elaborate in the 
design doc.

bq. Is the proxy going to download the entire contents of a URL to try to 
compute the checksum of the javascript inside it before passing it on to the 
user

The entire content is downloaded and processed by a stream scanner to compute 
checksums and writes to a temporary local file, so memory overhead is minimal. 
The filesystem cache works well enough for most cases. Again, this entire 
process is bypassed if the request user is the owner of AM.

bq. Is all this processing just so that the proxy can pop up a warning message 
saying this page looks a bit odd?

The warning is for admins or other users and not the user of the AM itself. 
Again, the question seems to indicate a lack of understanding of the threat 
model.

bq. Are we also going to download the complete contents of all of the JS files 
that the HTML points to?

Unless the js src url is in the (configurable) whitelist as well.

bq. What about JSON/XML data or other static files are we going to do anything 
with it?

They'll be passed through if the proxy authentication succeeds.

bq. How are we going to generate these signatures at compile time?

As you've noticed, it's relatively easy for Hamlet based webapps. The scanner 
can scan the project for any HtmlPage descendant classes and use a guice module 
to inject a HttpServletResponse to capture the script content checksums. If 
people don't want to use Hamlet, they have to find a comparable solution (lack 
of a comparable solution indicates an incentive to switch to a better framework 
:)

Anyway, the design doc will contain more details as well as rationales.

                
> MRv2 WebApp Security
> --------------------
>
>                 Key: MAPREDUCE-2858
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2858
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2, security
>    Affects Versions: 0.23.0
>            Reporter: Luke Lu
>            Assignee: Luke Lu
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> In MRv2, while the system servers (ResourceManager (RM), NodeManager (NM) and 
> NameNode (NN)) run as "trusted"
> system users, the application masters (AM) run as users who submit the 
> application. While this offers great flexibility
> to run multiple version of mapreduce frameworks (including their UI) on the 
> same Hadoop cluster, it has significant
> implication for the security of webapps (Please do not discuss company 
> specific vulnerabilities here).
> Requirements:
> # Secure authentication for AM (for app/job level ACLs).
> # Webapp security should be optional via site configuration.
> # Support existing pluggable single sign on mechanisms.
> # Should not require per app/user configuration for deployment.
> # Should not require special site-wide DNS configuration for deployment.
> This the top jira for webapp security. A design doc/notes of threat-modeling 
> and counter measures will be posted on the wiki.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to