I'm tasked with creating a guide that instructs on how to choose a Hadoop 
distribution from the handful of common options.  I'm finding this rather 
perplexing.  While some of the venders offer additional management software 
(Cloudera Manager is an example) I'm unclear whether those packages could be 
installed and run irregardless of the underlying Hadoop distribution or if they 
are exclusively compatible with their vender's distribution (or if there's some 
crossover).  I'm also unclear on any other basis for comparison.  For example 
HortonWorks originated HCatalog (to the best of my understanding), but that 
doesn't necessarily mean one needs to use the HW Hadoop dist. to use HCatalog 
since it's just a public Apache project anyway at this point.  I'm sure similar 
statements could be made about MapR or Greenplum (although I thin Greenplum's 
Hadoop uses MapR's M5 anyway so again, the decision-making process in such a 
case seems baffling).

And then there's the option of installing the Apache version directly, always 
on the table I suppose.

Does anyone have any thoughts on what criteria might govern such a decision?  
I'm not trying to get into an argument about which distribution is best, I'm 
not even looking for defenses or arguments for one distribution or another, but 
rather a notion of what the criteria for basing such a decision might be.

Thanks.

Cheers!

________________________________________________________________________________
Keith Wiley     kwi...@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley
________________________________________________________________________________

Reply via email to