[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125198#comment-13125198
 ] 

Robert Joseph Evans commented on MAPREDUCE-2858:
------------------------------------------------

I am happy to help out how ever I can so long as this design is what the 
community decides it really wants.  I still have strong objections to using a 
proxy to a different web server.  I feel I have made my objections clear in 
previous comments and in comments on MAPREDUCE-2863 and I will leave it at 
that.  

I would however like some more detail about exactly how this proxy will behave. 
 Or perhaps how it currently does behave as Luke has indicated he has working 
patches that are trying to get their way through IBM legal.  We have a high 
level concept but the details are a bit sketchy to me.  

Is the proxy going to try and rewrite URLs so that they always pass through the 
proxy or is it simply going to rely on the application master to only output 
relative URLs?

How is the RM going to generate the AM URL from the URL that the AM returns?  
i.e. What is :am_uri in 
http://app-proxy1.cluster1.company.com:8181/yarn/:app_id/:am_uri?

How is the proxy going to pass the user name to the Application Master?

Is there any plans for VIP on the proxies for failover?  If the RM is inserting 
a proxy only based off of the config, what happens if that proxy goes down?  We 
probably want to use a VIP in front of the proxies and have the App Master 
verify it using InetAddress.getAllByName.  If there is more then one proxy in 
the config is the RM going to ping the proxies on an ongoing basis to be able 
to return a URL that is valid?

The white listing based off of crypto signatures seems very confusing to me, 
possibly slow/memory intensive, and very much not user friendly.

 * Is the proxy going to download the entire contents of a URL to try to 
compute the checksum of the javascript inside it before passing it on to the 
user? A malicious app master could crash the proxy by sending huge amounts of 
data to it, unless we can spill it to disk at some point, or set a maximum size 
limit on the amount of data that we cache.  
 * Is all this processing just so that the proxy can pop up a warning message 
saying this page looks a bit odd?  I thought the point of having a user 
changeable API was so that the user could modify it to make it fit their needs 
better.  Now if they change it in any significant way, that involves 
javascript, every page a user views on this new application master they will 
have to click through a warning message, or the proxy is going to have to store 
a cookie or something saying this user has accepted the risks for this 
page/this app master (I really don't know how the proxy can definitely say what 
the user has opted into).  
 * Are we also going to download the complete contents of all of the JS files 
that the HTML points to?  We would have to if we really wanted the signature to 
be accurate, or else they could hide something inside a JS file.
 * What about JSON/XML data or other static files are we going to do anything 
with it?
 * How are we going to generate these signatures at compile time?  All of the 
HTML pages are dynamically generated.  Are we going to run a unit test like 
script and generate the signatures?  What about for other non-mrv2 projects 
that might not want to use Hamlet, are we going to bring up a web server and 
scrape the pages?


                
> MRv2 WebApp Security
> --------------------
>
>                 Key: MAPREDUCE-2858
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2858
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2, security
>    Affects Versions: 0.23.0
>            Reporter: Luke Lu
>            Assignee: Luke Lu
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> In MRv2, while the system servers (ResourceManager (RM), NodeManager (NM) and 
> NameNode (NN)) run as "trusted"
> system users, the application masters (AM) run as users who submit the 
> application. While this offers great flexibility
> to run multiple version of mapreduce frameworks (including their UI) on the 
> same Hadoop cluster, it has significant
> implication for the security of webapps (Please do not discuss company 
> specific vulnerabilities here).
> Requirements:
> # Secure authentication for AM (for app/job level ACLs).
> # Webapp security should be optional via site configuration.
> # Support existing pluggable single sign on mechanisms.
> # Should not require per app/user configuration for deployment.
> # Should not require special site-wide DNS configuration for deployment.
> This the top jira for webapp security. A design doc/notes of threat-modeling 
> and counter measures will be posted on the wiki.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to