Re: Cloudera announces Oryx

Sean Owen Tue, 12 Nov 2013 08:28:53 -0800

On Tue, Nov 12, 2013 at 4:02 PM, Manuel Blechschmidt
<[email protected]> wrote:
> It would be nice if Cloudera could publish some benchmarks. Cloudera vs. 
> Mahout vs. SAP HANA PAL vs. SPSS to give somebody the chances to enhance 
> Mahout in a way that it can catch up.


Does this need to be a "versus" thing? I and other engs here did a
fair bit of work to keep the Mahout code working in CDH5 / Hadoop 2.2,
and contributed that back. For a company apparently trying to
undermine Mahout we're not very good at it...

I like the benchmark sentiment. The two projects actually have little
overlap in functionality, which is the essence of the reason why it's
a different project. Oryx has nothing but RDF, kmeans++, and ALS. No
visualization, no text processing tools. No library-like interfaces.

On the other hand the piece of the puzzle Oryx is trying to add (model
serving) has no counterpart in this project, with possible exception
of Taste. So there's not much to compare with a benchmark.

In-memory pretty well always beats Hadoop. I can tell you that I think
the ALS in Mahout is *faster* I'm pretty sure mostly for loading a
bunch into memory. But the in-memory ALS in Oryx of course is faster
by an order of magnitude than both. How do you want to benchmark that?

I have never used SPSS or HANA's offering here, but am willing to bet
it's wicked fast without even bothering to measure.

I'm not even sure speed is the only or main point? Things like
usability out of the box top my list. And being open source and
working with data in HDFS.

Re: Cloudera announces Oryx

Reply via email to