Here is a slightly different take on the high level API question: There is a common notation used to textually specify graphs called DOT. It is described here: https://en.wikipedia.org/wiki/DOT_(graph_description_language) and a specification is available here: http://www.graphviz.org/content/dot-language
It is used by a variety of graph visualization tools but most notably by GraphViz: http://www.graphviz.org/. It is also used by Pig in their testing framework, e.g. http://www.programcreek.com/java-api-examples/index.php?api=org.apache.pig.test.utils.dotGraph.parser.DOTParser We can use this notation for DAG specification; doing that for the archetype generated application results in the body of the *populateDAG()* method looking like this: -------------------------------------------- * final String s =* * "digraph G {\n" +* * "randGen [class=\"com.example.myapexapp.RandomNumberGenerator\"];\n" +* * "console [class=\"com.datatorrent.lib.io.ConsoleOutputOperator\"];\n" +* * "randGen -> console [id=randomData, src=out, tgt=input]\n" +* * "}";* * StringBuffer sb = new StringBuffer(s);* * Builder.build(sb, dag);* -------------------------------------------- For more complex DAGs, the DOT string defining the DAG will obviously get longer but the code remains the same -- just 2 lines. The benefits here are that the DAG specification is decoupled from the mechanics of language bindings and we would be using a notation that is already widely used for graph specification. A sample proof-of-concept implementation is available at: https://github.com/amberarrow/samples The file *hilevel/README.md* describes how to build the application. Ram On Wed, Dec 23, 2015 at 1:42 PM, David Yan <[email protected]> wrote: > Hi fellow Apex developers: > > Apex has a comprehensive API for constructing DAG topologies for streaming > applications, using operators, ports and streams. But this may seem too > much for folks who just want to build simple applications, or just to learn > about Apex. For example, when you compare the code to do word count in > Apex with Spark Streaming or Flink, Apex requires much more code. > > Apex: > > https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main > > Spark Streaming: > > https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java > > Flink: > > https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java > > Note that their Scala versions are even simpler to use. > > The high-level requirements I have in mind is as follow: > > 1. A simple-to-use high-level API similar to what Spark Streaming and Flink > have. And from the high-level API, the Apex engine will construct the > actual DAG topology at launch time. > > 2. The first language we will support is Java, but we will also want to > support Scala and possibly Python at some point, so the high-level API > should make it easy for implementing bindings for at least these two > languages. > > 3. We should be able to use the high-level API in Apex App Package (apa) > file, so that dtcli can launch it just like a regular apa today. > > Please provide your ideas and thoughts on this topic. > > Thanks, > > David >
