[ 
https://issues.apache.org/jira/browse/FLUME-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233890#comment-13233890
 ] 

[email protected] commented on FLUME-1020:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4360/
-----------------------------------------------------------

(Updated 2012-03-20 22:33:43.076249)


Review request for Flume.


Changes
-------

No java code has changed.

Updated the build environment and the runtime environment as follows:

1. At runtime, if the hadoop binary can be found on the system, Flume 
interrogates it to get the CLASSPATH and JAVA_LIBRARY_PATH variables out of it 
using tricks kindly shared by Roman in Bigtop. This allows us to find the 
hadoop configuration files and the appropriate JARs for the system being 
accessed. At least, it's basically the state of the art for compatibility right 
now if you want to call it that.
2. To allow the tricks above to work at runtime, the hadoop artifacts have been 
marked as optional in the POM, which means that they will not be included in 
the binary distribution. That's fine, because they are only needed if the HDFS 
Sink is used, and we jump through hoops to find those artifacts if they're on 
the system.

As a result, I am able to compile Flume against the default hadoop version 
(0.20.205.0) and run against versions of Hadoop that I didn't explicitly build 
against, like 0.23.x, without a problem. This is a huge improvement over when I 
was working on this last week, where I was adding/fixing profiles to get 
anything to work at all, since it's well known that different hadoop versions 
generally refuse to talk to each other.

I also refactored the flume-ng script to duplicate less code and be a bit 
friendlier, since I was doing surgery in there anyway.

This is ready for review now.


Summary
-------

This is an initial pass at an implementation of HDFS security. I think it will 
probably work. Currently trying to get Kerberos to play nice with the cluster 
on my VM though, so I haven't successfully tested it yet. It still works when 
used on HDFS with security disabled. :)

The only thing I don't like is in configure() when authentication fails I throw 
a FlumeException. I'll trace up and see how bad that would be but it seems 
likely to break something. Just logging the error is kind of a bummer as well, 
though ... need to ensure process() doesn't fill up the disk while spewing 
copious error messages into the logs. Maybe this is a use case for some kind of 
FatalException type thing.


This addresses bug FLUME-1020.
    https://issues.apache.org/jira/browse/FLUME-1020


Diffs (updated)
-----

  bin/flume-ng 0796a5b 
  flume-ng-sinks/flume-hdfs-sink/pom.xml 1a35baf 
  
flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/HDFSEventSink.java
 da82f7e 

Diff: https://reviews.apache.org/r/4360/diff


Testing
-------


Thanks,

Mike


                
> Implement Kerberos security for HDFS Sink
> -----------------------------------------
>
>                 Key: FLUME-1020
>                 URL: https://issues.apache.org/jira/browse/FLUME-1020
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.0.0
>            Reporter: Arvind Prabhakar
>            Assignee: Mike Percy
>
> Make flume HDFS sink work with secure clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to