Here is a slightly different take on the high level API question:

There is a common notation used to textually specify graphs called DOT. It
is
described here:
https://en.wikipedia.org/wiki/DOT_(graph_description_language)
and a specification is available here:
http://www.graphviz.org/content/dot-language

It is used by a variety of graph visualization tools but most notably by
GraphViz: http://www.graphviz.org/. It is also used by Pig in their testing
framework, e.g.
http://www.programcreek.com/java-api-examples/index.php?api=org.apache.pig.test.utils.dotGraph.parser.DOTParser

We can use this notation for DAG specification; doing that for the archetype
generated application results in the body of the *populateDAG()* method
looking
like this:
--------------------------------------------
*    final String s =*
*      "digraph G {\n" +*
*      "randGen [class=\"com.example.myapexapp.RandomNumberGenerator\"];\n"
+*
*      "console
[class=\"com.datatorrent.lib.io.ConsoleOutputOperator\"];\n" +*
*      "randGen -> console [id=randomData, src=out, tgt=input]\n" +*
*      "}";*

*    StringBuffer sb = new StringBuffer(s);*
*    Builder.build(sb, dag);*
--------------------------------------------

For more complex DAGs, the DOT string defining the DAG will obviously get
longer but the
code remains the same -- just 2 lines. The benefits here are that the DAG
specification is
decoupled from the mechanics of language bindings and we would be using a
notation
that is already widely used for graph specification.

A sample proof-of-concept implementation is available at:
https://github.com/amberarrow/samples
The file *hilevel/README.md* describes how to build the application.

Ram


On Wed, Dec 23, 2015 at 1:42 PM, David Yan <[email protected]> wrote:

> Hi fellow Apex developers:
>
> Apex has a comprehensive API for constructing DAG topologies for streaming
> applications, using operators, ports and streams.  But this may seem too
> much for folks who just want to build simple applications, or just to learn
> about Apex.  For example, when you compare the code to do word count in
> Apex with Spark Streaming or Flink, Apex requires much more code.
>
> Apex:
>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>
> Spark Streaming:
>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>
> Flink:
>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>
> Note that their Scala versions are even simpler to use.
>
> The high-level requirements I have in mind is as follow:
>
> 1. A simple-to-use high-level API similar to what Spark Streaming and Flink
> have. And from the high-level API, the Apex engine will construct the
> actual DAG topology at launch time.
>
> 2. The first language we will support is Java, but we will also want to
> support Scala and possibly Python at some point, so the high-level API
> should make it easy for implementing bindings for at least these two
> languages.
>
> 3. We should be able to use the high-level API in Apex App Package (apa)
> file, so that dtcli can launch it just like a regular apa today.
>
> Please provide your ideas and thoughts on this topic.
>
> Thanks,
>
> David
>

Reply via email to