For such sizable merges in Hadoop, I would like to start doing security audits 
in order to have an initial idea of the attack surface, the protections 
available for known threats, what sort of configuration is being used to launch 
processes, etc.

I dug into the architecture documents while in the middle of this list - nice 
docs!
I do intend to try and make a generic check list like this for such security 
audits in the future so a lot of this is from that but I tried to also direct 
specific questions from those docs as well.

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space 
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input? 
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code would 
be appreciated):
   1.2.1. cross site scripting
   1.2.2. cross site request forgery 
   1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what 
capabilities of the UIs for either viewing, modifying data or affecting object 
stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded headers?
1.6. Is there any input that will ultimately be persisted in configuration for 
executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas impersonation 
capabilities?
2.2. What explicit protections have been built in for:
   2.2.1. cross site scripting (XSS)
   2.2.2. cross site request forgery (CSRF)
   2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST 
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what 
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is dependent 
on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there paging 
available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what does 
this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only bucket 
owners can read and write - but there are ACL APIs on the Bucket Level are they 
meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I 
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for 
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or shell 
out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still an 
open issue?

> On Oct 20, 2017, at 7:49 AM, Steve Loughran <ste...@hortonworks.com> wrote:
> 
> 
> Wow, big piece of work
> 
> 1. Where is a PR/branch on github with rendered docs for us to look at?
> 2. Have you made any public APi changes related to object stores? That's 
> probably something I'll have opinions on more than implementation details.
> 
> thanks
> 
>> On 19 Oct 2017, at 02:54, Yang Weiwei <cheersy...@hotmail.com> wrote:
>> 
>> Hello everyone,
>> 
>> 
>> I would like to start this thread to discuss merging Ozone (HDFS-7240) to 
>> trunk. This feature implements an object store which can co-exist with HDFS. 
>> Ozone is disabled by default. We have tested Ozone with cluster sizes 
>> varying from 1 to 100 data nodes.
>> 
>> 
>> 
>> The merge payload includes the following:
>> 
>> 1.  All services, management scripts
>> 2.  Object store APIs, exposed via both REST and RPC
>> 3.  Master service UIs, command line interfaces
>> 4.  Pluggable pipeline Integration
>> 5.  Ozone File System (Hadoop compatible file system implementation, passes 
>> all FileSystem contract tests)
>> 6.  Corona - a load generator for Ozone.
>> 7.  Essential documentation added to Hadoop site.
>> 8.  Version specific Ozone Documentation, accessible via service UI.
>> 9.  Docker support for ozone, which enables faster development cycles.
>> 
>> 
>> To build Ozone and run ozone using docker, please follow instructions in 
>> this wiki page. 
>> https://cwiki.apache.org/confluence/display/HADOOP/Dev+cluster+with+docker.
>> 
>> 
>> We have built a passionate and diverse community to drive this feature 
>> development. As a team, we have achieved significant progress in past 3 
>> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we have 
>> resolved almost 400 JIRAs by 20+ contributors/committers from different 
>> countries and affiliations. We also want to thank the large number of 
>> community members who were supportive of our efforts and contributed ideas 
>> and participated in the design of ozone.
>> 
>> 
>> Please share your thoughts, thanks!
>> 
>> 
>> -- Weiwei Yang
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
> 

Reply via email to