Re: Straw poll re: H2O ?

Cliff Click Thu, 01 May 2014 08:41:25 -0700

H2O will launch an internal Task in the single-digit microsecond range.Because of this, we can launch 100,000's (millions?) a second... leadingto fine-grained data parallelism, and high CPU utilization. This is abig piece of our single-node speed. Some other distributedTask-launching solutions I've seen tend to require a network-hopper-task... leading to your 10ms to launch as task requirement, leadingto a limit of a few 1000 Tasks/sec requiring tasks that are much largerand coarser than H2O's... leading to much lower CPU utilization.

Also, I'm getting 200micro-second ping's between my datacentermachines.... down from 10msec. It's decent commodity hardware, nothingspecial. Meaning: H2O can launch task on an entire 32-node cluster inabout 1msec, starting from a single driving node (log-tree fanout, depth5, 200micro-second single UDP packet launch, 1micro-second internal tasklaunch).

And this latency matters when the work itself is lots and lots "small"jobs, as is common when a DSL such as Mahout or Spark/Scala or R isdriving simple operators over bulk data.


Cliff


On 4/30/2014 3:35 PM, Dmitriy Lyubimov wrote:

This is kind of an old news. They all do, for years now. I've beenbuilding a system that does real time distributed pipelines (~30 ms tostart all steps in pipeline + in-core complexity) for years. Note thatnode-to-node hop in clouds are usually mean at about 10ms somicroseconds are kind of out of question for network performancereasons in real life except for private racks. The only thing thatdoesn't do this is the MR variety of Hadoop.

Re: Straw poll re: H2O ?

Reply via email to