Hello,

See: 
http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1

Note that the various DSLs these systems use are analogous to Gremlin --- they 
all use the "functional-fluent"-style. We need to stress to people that if they 
use Spark/Storm/Flink/Samza/Scala/Java8/Clojure, then Gremlin fits their 
already existing mental model of data flows and aggregations. When people say a 
query language needs to be "like SQL," point them to the fact that most modern 
data processing frameworks don't use that style. When people say that SQL is 
declarative and thus can be optimized, tell them that these functional-fluent 
languages build a query plan that is optimized for the underlying execution 
engine. By making an "SQL language," all you are doing is making another layer 
of indirection -- now you have to compile a String down to the underlying 
execution language (e.g. Java). Modern data processing languages don't waste 
the effort as the constructs in modern programming languages provide enough 
expressivity. Moreover, these languages lead (I believe) to execution engine 
designs that naturally support both single machine and compute cluster 
executions (they have a map/reduce-foundation inherent in their representation).

GREMLIN

text.map(line -> line.split(" "))
    .unfold()
    .groupCount()

STORM
topology.newStream("spout1", spout)
 .each(new Fields("sentence"),new Split(), new Fields("word"))
 .groupBy(new Fields("word"))
 .persistentAggregate(new MemoryMapState.Factory(), 
 new Count(), new Fields("count"));
SPARK

text.flatMap(line => line.split(" "))
 .map(word => (word, 1))
 .reduceByKey(_ + _)


SAMZA

text.split(" ").foldLeft(Map.empty[String, Int]) { 
   (count, word) => count + (word -> (count.getOrElse(word, 0) + 1))
}


FLINK

text.flatMap ( _.split(" ") )
   .map ( (_, 1) )
   .groupBy(0)
   .sum(1)

Take care,
Marko.

http://markorodriguez.com

Reply via email to