Re: Question about Hadoop 's Feature(s)

Steve Loughran Thu, 25 Sep 2008 04:50:35 -0700

Owen O'Malley wrote:

On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:
We are developing a project and we are intend to use Hadoop to handlethe processing vast amount of data. But to convince our customersabout the using of Hadoop in our project, we must show them theadvantages ( and maybe ? the disadvantage ) when deploy the projectwith Hadoop compare to Oracle Database Platform.
The primary advantage of Hadoop is scalability. On an equivalenthardware budget, Hadoop can handle much much larger databases. We had aprocess that was run once a week on Oracle that is now run once an houron Hadoop. Additionally, Hadoop scales out much much farther. We canstore petabytes of data in a single Hadoop cluster and have jobs thatread and generate 100's of terabytes.

That said, what a database gives you -on the right hardware- is veryfast responses, especially if the indices are set up right and the datadenormalised when appropriate. There is also really good integrationwith tools and application servers, with things like Java EE designed tomake running code against a database easy.

Not using Oracle means you don't have to work with an Oracle DBA, which,in my experience, can only be a good thing. DBAs and developers neverseem to see eye-to-eye.

Hadoop only has very primitivesecurity at the moment, although I expect that to change in the next 6months.

Right now you need to trust everyone else on the network where you runhadoop to not be malicious; the filesystem and job tracker interfacesare insecure. The forthcoming 0.19 release will ask who you are, but thefar end trusts you to be who you say you are. In that respect, it's assecure as NFS over UDP.


To secure Hadoop you'd probably need to
 -sign every IPC request, with a CPU time cost at both ends.

-require some form of authentication for the HTTP exported parts ofthe system, such as digest authentication, or issue lots of HTTPSprivate keys and use that instead. Giving everyone a key managementproblem as well as extra communications overhead.

What is easier would be to lock down remote access to the filesystem/jobsubmission so that only authenticated users would be able to upload jobsand data. The cluster would continue to trust everything else on itsnetwork, but the system doesn't trust people to submit work unless theycould prove who they were.

Re: Question about Hadoop 's Feature(s)

Reply via email to