"detailed description of h20's programming and execution model."

No *formal* documentation for this exists; been no time to write such a thing.
There's easy-to-find slide-share & video talks.  Here are two:
 - http://www.infoq.com/presentations/api-memory-analytics
 - http://www.infoq.com/interviews/click-0xdata

Summary:
- A high-performance in-memory K/V store (cache-hits are 150 nano's, misses depend on network transfer times). Supports full JMM exact semantics & transactions. Used to hold the Big Data & to control computations - Big Data support via Frames/Vecs/Chunks - see the above slides for graphical overview; compression "is a implementation feature" but not visible in the execution model except as speed or size constraints.
- A well-tuned data-ingestion system
- Map/Reduce coding style, uses Java 1.7's Fork/Join on a single-node, but distributed across nodes. Maps are fine-grained F/J tasks and can produce both a Big output (distributed parallel writing to Frames/Vecs) and a Small output (anything in a POJO). Reductions are also fine-grained, and happen anytime 2 maps are done... so separate "reduction" phase. Not the hadoop M/R - no sort or shuffle steps, everything in DRAM.
- REST/JSON access to most algo's & coding.  Web browser/html over that.
- Internal DSL - A work-in-progress. Right now converts a subset of the R language to AST's, then executes the AST's. Covers a fairly large subset of the bulk/array operators in R, and expressions built thereof. Includes 1st-class functions and e.g. GroupBy (ddply in R lingo). Expressions like "|apply(someFrame,2,function(x){ ifelse(is.na(x),mean(x),x)})|" will replace NA's in "someFrame" with the mean of the column. It's R syntax (or very close to R), not Scala.

Cliff



On 5/1/2014 10:13 AM, Dmitriy Lyubimov wrote:

I'd be happy to see a concept of how to bring the operations of the DSL
onto h20, as well as a detailed description of h20's programming and
execution model.
+1.


--sebastian


Reply via email to