Re: Weka.IO in the news... but not mentioning Dlang... why?

Shachar Shemesh via Digitalmars-d Sat, 23 Sep 2017 09:11:15 -0700

On 23/09/17 11:57, Suliman wrote:

One is a linear database and the other is a filesystem?
If that doesn't satisfy you, please describe to me the differencebetween D and Microsoft Word, so I know what kind of answer you'reexpecting.
But Hadoop is more look like file system that DataBase...

Hadoop Distributed File System is, sort of, a file system. I don't knowmuch about it (just read the Wikipedia page), so I'll try to answer asbest I understand. Corrections welcome:


Performance:

I have not idea what HDFS's per-node performance numbers are, but thereare several indications that make me suspect they are not as good as Weka's.

First of all, I don't think a tool written in Java, designed to run overanother file system and the kernel's networking has any chance ofout-performing a tool written in D, the directly uses the NVME and thenetwork interface.

Second, the file system seems oriented toward large read-only blobs. Asa file system, I don't think it has any chance against any dedicatedPosix compliant file system, but I'm guessing you're mostly interestedin using HDFS as a basis for running Hadoop itself, so that might notmatter.



Cost:

Here I don't think there is any way for HDFS to compete. That mightsound strange to some, as HDFS is open source while Weka chargelicensing fees. The reason I'm saying this is because HDFS usesmirroring in order to achieve fault tolerance, while Weka uses Raid (Ishould know - I wrote it).

In short, to get 1PB of usable capacity while tolerating 2 faults you'llneed 3PB of raw capacity with Hadoop (200% overhead). At 16+2, you'llonly need around 1.3PB with Weka. Whatever you're paying for thelicenses is, in all likelihood, going to be less than the cost of thehardware.

Like I said, corrections are welcome, as I'm not familiar with HDFS orHadoop.


Shachar

Re: Weka.IO in the news... but not mentioning Dlang... why?

Reply via email to