Re: Platform MapReduce - Enterprise Features

Steve Loughran Tue, 13 Sep 2011 05:21:45 -0700

As an aside, if you ask for the white paper you get a PDF thatover-exaggerates the limits of Hadoop.


http://info.platform.com/rs/platform/images/Whitepaper_Top5ChallengesforHadoopMapReduceintheEnterprise.pdf

Mostly focusing on a critique of the scheduler -which MR-279 will fix inHadoop 0.23- they say


  "It is designed to be used by IT departments that
  have an army of developers to help fix any issues they en-
  counter"

I don't believe this. Cloudera and Hortonworks will do this for a fee-as will Platform. In most organisations the R&D effort doesn't go intothe Hadoop codebase, it goes into writing the analysis code, which iswhy things like Pig and Hive help -they make it easier.



  "Their (Clouderas) distribution is based on open source
  which is still an unproven large-scale enterprise full stack
  solution. There are many shortcomings in the open source
  distribution,  including the workload management capa-
  bilities.

  Other open source commercial distributions are
  emerging, with IBM and EMC entering the marketplace.
  However, all of these offerings are based on open source
  code and inevitably inherit the strengths and weaknesses
  of that code base and architectural design. "

Ted will point out that MapR's MR engine isn't limited, as will Brisk,while Arun will view that statement in the past tense. Doug and Tom willpick up on the word "unproven" too. Which enterprises plan to haveHadoop clusters bigger than Yahoo or Facebook?

Furthermore, as Platform only puts in their own scheduler, leaving thefilesystem alone, it's a bit weak to critique the architecture of theopen source distro. Not a way to make friends -or get your bug fixes in.Or indeed, promise better scalability.


"Therefore they cannot meet the enterprise–class requirements for ”big
data” problems as already mentioned."

This is daft. The only thing platform brings to the table is a schedulerthat works with "legacy" grid workloads and a console to see what'sgoing on. I don't see that being tangibly more enterprise-class than theexisting JT -which does persist after an outage. With HDFS underneath anew scheduler doesn't even remove the filesystem SPOFs, so the only wayto get an HA cluster is to swap in a premium filesystem.

The other thing the marketing blurb gets wrong is its claim that Hadooponly works with one distributed file system. Not so. You can read in andout of any filesystem, file:// being a handy one what works with NFSmount points too.

Overall, a disappointing white paper, as all it can to do criticise opensource Hadoop is spread fear about the #of developers you need tomaintain it, and the limitations of the Hadoop scheduler vs theirScheduler -that being the only that differs from the Platform productfrom the full OSS release.

I missed a talk at the local university by a Platform sales rep lastmonth, though I did get to offend one of the authors of condor teaminstead [1]. by pointing out that all grid schedulers contain a majorassumption: that storage access times are constant across your cluster.It is if you can pay for something like GPFS, but you don't get 50TB ofGPFS storage for $2500, which is what adding 25*2TB SATA drives wouldcost if you stuck them on your compute nodes; $7500 for a fullyreplicated 50TB. That's why I'm not a fan of grid systems -cost ofstorage and networking aren't taken into account. Then there's theavailablity issues with the larger filesystems, that are a topic foranother day.

I look forward to them giving a talk at any forthcoming London HUG eventand will try to do a follow-on talk introducing MR-279 and arguing infavour of an OSS solution because the turnaround time on defects is faster.


-Steve

[1] (Miron Livny ), facing the camera, two to the left of Sergey Melnikwith the camera -the author of Dremel: http://flic.kr/p/akUzE7

Re: Platform MapReduce - Enterprise Features

Reply via email to