David,
I can imagine that it boils down to something like these function calls.
DtString lines = readLines(new LineReader());
DtString words = lines.split(new LineSplitter());
DtNumber count = words.count(new Counter());
count.print(new ConoleOutputOperator());
Or
readLines(new LineReader())
.split(new LineSplitter())
.count(new Counter())
.print(new ConoleOutputOperator());
which translates to
Reader reader = dag.addOperator("ReadLines", new LineReader());
LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
WordCounter counter = dag.addOperator("Count", new Counter());
ConsoleOutputOperator console = dag.addOperator("Print", new
ConsoleOutputOperator());
dag.addStream("lines", reader.output, splitter.input);
dag.addStream("words", splitter.output, counter.input);
dag.addStream("count", counter.output, console.input);
Here are my initial thoughts:
For the higher level api to work, we need the following support at least.
1. The operators used in the higher level api should have concrete
implementations with all available input and output ports defined at the
abstract level. Have to think more about how multiple output ports will
play out.
2. We need to define the objects that have method calls available on them
that take operators as parameters.
Eg: DtString can have method split and takes Splitter operator. And
Splitter operator should be abstract with input port type DtString and
output port type DtString. LineSplitter will be a concrete implementation
of this operator.
Regards,
Ashwin.
On Wed, Dec 23, 2015 at 1:42 PM, David Yan <[email protected]> wrote:
> Hi fellow Apex developers:
>
> Apex has a comprehensive API for constructing DAG topologies for streaming
> applications, using operators, ports and streams. But this may seem too
> much for folks who just want to build simple applications, or just to learn
> about Apex. For example, when you compare the code to do word count in
> Apex with Spark Streaming or Flink, Apex requires much more code.
>
> Apex:
>
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>
> Spark Streaming:
>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>
> Flink:
>
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>
> Note that their Scala versions are even simpler to use.
>
> The high-level requirements I have in mind is as follow:
>
> 1. A simple-to-use high-level API similar to what Spark Streaming and Flink
> have. And from the high-level API, the Apex engine will construct the
> actual DAG topology at launch time.
>
> 2. The first language we will support is Java, but we will also want to
> support Scala and possibly Python at some point, so the high-level API
> should make it easy for implementing bindings for at least these two
> languages.
>
> 3. We should be able to use the high-level API in Apex App Package (apa)
> file, so that dtcli can launch it just like a regular apa today.
>
> Please provide your ideas and thoughts on this topic.
>
> Thanks,
>
> David
>
--
Regards,
Ashwin.