[ 
https://issues.apache.org/jira/browse/HDFS-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838667#comment-13838667
 ] 

Adam Faris commented on HDFS-5569:
----------------------------------

Hi Colin, Just so you know I'm not trying to push any agenda and I know you 
guys do great work for the Hadoop ecosystem at Cloudera.  But there are flaws 
in your statements that need to be corrected as others reading this at a later 
date will get be confused. 

{quote}
I don't understand why you jump immediately to the assumption that IP spoofing 
is necessary to break IP-based authentication. There are plenty of networks in 
the world where you can join without any trouble. One example is a class B 
network such as 172.16.X.Y. If the administrator tries to filter addresses 
172.16.1.X but allow 172.16.2.X, it would be easy for an attacker to 
reconfigure his IP from a 172.16.1.X to a 172.16.2.X using just ifconfig. 
{quote}

Because you didn't mention the netmask in your example and the only thing that 
makes sense for your example is the netmask being a "/16" (255.255.0.0). If so 
then it's fine to assign yourself a address from 172.16.1.x or 172.16.2.x 
because it's the same network space.  This is not IP spoofing but merely 
assigning yourself an approved IP from the same network, therefor you are not 
bypassing any security.  

{quote}
In many cases, we can also use "source routing" to get around IP-based 
restrictions. Keep in mind, this does not require spoofing! Source routing 
allows the packet to specify its own route through the network. This 
potentially allows you to reach destinations that you would not otherwise be 
able to get to. Many routers now disable source-routed packets, why open a hole 
for those that do not?
{quote}

This is handled by router configs and "loose source record route" is blocked on 
modern networks.   This comment is not relevant to the changes I'm requesting 
to WebHDFS as it's controlled by network gear as you acknowledged in your 
statement.

{quote}
Now, let's turn to considering spoofing itself. Successful IP spoofing often 
does not allow the attacker to get back a response to his packets. However, 
that isn't necessarily needed in this case, because there are many webHDFS 
operations that delete files, damage data, etc.

... 

There's more information on how to defeat IP-based filtering here: 
http://technet.microsoft.com/library/cc723706.aspx It calls SNMP "a security 
disaster" partly because it often relies on IP-based filtering for security. I 
don't think we should be trying to reproduce a security scheme that everyone 
agrees is a disaster.
{quote}

The technet document you referenced calls SNMP a disaster for a reason, it's 
UDP based.  As UDP doesn't have a connection hand shake, the receiving system 
is of course going to trust the source IP.  With TCP which is what WebHDFS 
uses, host A can send a SYN packet with a spoofed IP, but the SYN-ACK hand 
shake reply is going to go to the real IP of host G which has the actual 
address that was spoofed by host A.  The three way hand shake will never happen 
and because the TCP connection will fail it's not possible to send a 'delete' 
or other simple request to WebHDFS.  The referenced technet document is not 
relevant to this JIRA as it's poking holes at UDP, not TCP. 

{quote}
...
Physical security of networks is often an issue. Many times open ethernet jacks 
are available in an office or data center and you can get an IP address. Maybe 
even one that is inside the various firewalls. This is why people use real 
security systems like Kerberos, Active Directory, etc.
...
{quote}

We are in agreement that physical network jacks should be secured, but are not 
always guaranteed to be secured.   But please understand, Kerberos/Active 
Directory is not authorization as Kerberos only gives us authentication. (See 
my bank teller example above.) Due to the common practice of cross-realm 
trusts, we can not just rely on limiting what networks can talk to the KDC as 
my corp TGT is going to be trusted by the "hadoop" realm and work just fine.  
It's this limitation in Kerberos that is why I'm requesting a 'allow/deny' list 
based on IP.  

{quote}
Regarding DNS: I've dealt with many clusters for whom DNS lookup was a 
bottleneck. You may argue that they should have configured DNS better. But 
regardless, a security scheme that requires contacting DNS all the time would 
still cause significant regressions for those users. See Daryn Sharp's patch 
for https://issues.apache.org/jira/browse/HDFS-3990, which was designed partly 
to avoid unnecessary DNS lookups. There have been many other such patches, from 
people at Yahoo and other companies.
{quote}

True and I recognize that as Cloudera is a consulting company, you guys see a 
lot of weird stuff.  I'm not saying DNS lookups are never a bottleneck, just 
that there are already ways of preventing the bottleneck with caching.  I did 
find HDFS-3990 an interesting read and if I'm reading the patch correctly, 
Daryn is building a 'cache' of hostnames and IP's within ram for the namenode.  
Additionally the first comment in HDFS-3990 states that the the NN's page load 
time was reduced to seconds with the name server caching daemon.  I've already 
mentioned that the JVM itself can be configured to never release a hostname/ip 
mapping.  My point is that it doesn't matter where the actual cache lives, just 
that having one helps and now we have one more cache available to use.  I think 
we are in agreement that supporting hostname lookups while isn't cost free, it 
isn't the end of the world.  

{quote}
I think you should explain why the various alternatives people have offered 
here don't solve your problem. It seems really easy to use httpfs (plus perhaps 
a proxy) to get filtering as fine-grained as you want. The whole point of 
implementing httpfs was that the HTTP protocol could easily be filtered, 
proxied, etc. by third-party tools. If there are use cases that httpfs does not 
address, let's fix them rather than creating another parallel security system 
that does not follow best practices.
{quote}

Look above and you'll see I already explained why bolting nginx on top of the 
datanode jetty port or using httpfs doesn't solve this problem.  To summarize # 
proxies add complexity to troubleshooting client/server issues. # Tomcat/nginx 
is not part of the normal hadoop ecosystem. Now I have another software service 
to securely configure, deploy, monitor, support, and integrate with WebHDFS.  # 
Why replace jetty with something else when jetty offers the features I'm 
requesting.  One just needs to add the jetty hooks into WebHDFS. # using a 
proxy for access control when other proxy features aren't needed is like using 
Hadoop to process a 100kB text file, it's overkill.

FYI: For those not familiar with what I'm requesting, it's adding an access 
control feature like mod_auth_hostz in httpd. 
http://httpd.apache.org/docs/2.2/mod/mod_authz_host.html 

> WebHDFS should support a deny/allow list for data access
> --------------------------------------------------------
>
>                 Key: HDFS-5569
>                 URL: https://issues.apache.org/jira/browse/HDFS-5569
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: webhdfs
>            Reporter: Adam Faris
>              Labels: features
>
> Currently we can't restrict what networks are allowed to transfer data using 
> WebHDFS.  Obviously we can use firewalls to block ports, but this can be 
> complicated and problematic to maintain.  Additionally, because all the jetty 
> servlets run inside the same container, blocking access to jetty to prevent 
> WebHDFS transfers also blocks the other servlets running inside that same 
> jetty container.
> I am requesting a deny/allow feature be added to WebHDFS.  This is already 
> done with the Apache HTTPD server, and is what I'd like to see the deny/allow 
> list modeled after.   Thanks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to