On 23/09/17 11:57, Suliman wrote:
One is a linear database and the other is a filesystem?

If that doesn't satisfy you, please describe to me the difference between D and Microsoft Word, so I know what kind of answer you're expecting.


But Hadoop is more look like file system that DataBase...

Hadoop Distributed File System is, sort of, a file system. I don't know much about it (just read the Wikipedia page), so I'll try to answer as best I understand. Corrections welcome:

Performance:
I have not idea what HDFS's per-node performance numbers are, but there are several indications that make me suspect they are not as good as Weka's.

First of all, I don't think a tool written in Java, designed to run over another file system and the kernel's networking has any chance of out-performing a tool written in D, the directly uses the NVME and the network interface.

Second, the file system seems oriented toward large read-only blobs. As a file system, I don't think it has any chance against any dedicated Posix compliant file system, but I'm guessing you're mostly interested in using HDFS as a basis for running Hadoop itself, so that might not matter.


Cost:
Here I don't think there is any way for HDFS to compete. That might sound strange to some, as HDFS is open source while Weka charge licensing fees. The reason I'm saying this is because HDFS uses mirroring in order to achieve fault tolerance, while Weka uses Raid (I should know - I wrote it).

In short, to get 1PB of usable capacity while tolerating 2 faults you'll need 3PB of raw capacity with Hadoop (200% overhead). At 16+2, you'll only need around 1.3PB with Weka. Whatever you're paying for the licenses is, in all likelihood, going to be less than the cost of the hardware.


Like I said, corrections are welcome, as I'm not familiar with HDFS or Hadoop.

Shachar

Reply via email to