On 16/06/11 14:49, David Scott Williams wrote:
http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/
It's interesting that they decided they had to become open source to
survive. It's the Linux effect: it's not that it's better than Solaris
was, it just got the momentum up.
A strength of Hadoop is that it does have layers on it, and a lot of the
interesting stuff is above the basic layer -Mahout, Pig, Hive, Hama,
etc, and you can plug in things: filesystems, schedulers, HDFS placers.
While we can debate what "compatible" means, by implementing the APIs
that the higher layers use, MapR and hence EMC's products can run those
higher layers. HPCC looks to be a completely new ecosystem.
Oh, and the license is AGPL, which complicates any external-facing web
app way more than even GPL does. Good for business models (you can pay
for the alternate license), but not ideal for takeup.
HPCC do a good comparision page here, seems quite unbiased
http://hpccsystems.com/why-HPCC/HPCC-vs-hadoop
Regarding performance, I haven't seen any new terasort numbers for a
while. Whoever next brings up a 1000+ node cluster should publish them.
As HPCC say: "In practice, HPCC configurations require significantly
fewer nodes to provide the same processing performance as a Hadoop
cluster. Sizing of clusters may depend however on the overall storage
requirements for the distributed file system."
That means if your cluster is driven by storage demands, that fixes node
size more than CPU issues (though if you need less CPUs, that's capex
and opex savings or the opportunity to do other things with CPU time)