+1 on not coming up with our own APIs. We should adapt to existing ones so that there is additional learning curve for Apex users.
It is more of a functional way of specifying the DAG and a subset of scala could be a good starting point. Regards, Sandeep On Thu, Dec 24, 2015 at 5:21 AM, Siyuan Hua <[email protected]> wrote: > My first suggestion is we should focus on Stream API(or change the name we > call it) for now. High-level API is confusion and could be anything that > helps. > > Stream is in fact more well-known concept other than Operators, ports, > connector, etc. I think the idea originate from scala sequence API( > http://www.scala-lang.org/api/current/#scala.collection.Seq) > And the term "Stream" already implies some minimal function we need ex. > "map"(t1->f->t1'), "reduce" (t1, t2,... -> f -> t1'), "filter" (t1 ->f(if > true) -> t1) > We shouldn't come up with arbitrary things so the API would become > cumbersome and hard to learn. > > > > > > > On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <[email protected]> > wrote: > > > Another API that could be a reference is > > http://storm.apache.org/documentation/Trident-API-Overview.html > > > > On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta < > > [email protected]> wrote: > > > >> // Made few edits, ignore previous mail. Read this instead. > >> > >> David, > >> > >> I can imagine that it boils down to something like these function calls. > >> > >> DtString lines = readLines(new LineReader()); > >> DtString words = lines.split(new LineSplitter()); > >> DtNumber count = words.count(new Counter()); > >> count.print(new ConoleOutputOperator()); > >> > >> Or > >> > >> readLines(new LineReader()) > >> .split(new LineSplitter()) > >> .count(new Counter()) > >> .print(new ConoleOutputOperator()); > >> > >> > >> which translates to > >> > >> Reader reader = dag.addOperator("ReadLines", new LineReader()); > >> Splitter splitter = dag.addOperator("Split", new LineSplitter()); > >> Counter counter = dag.addOperator("Count", new WordCounter()); > >> ConsoleOutputOperator console = dag.addOperator("Print", new > >> ConsoleOutputOperator()); > >> > >> dag.addStream("lines", reader.output, splitter.input); > >> dag.addStream("words", splitter.output, counter.input); > >> dag.addStream("count", counter.output, console.input); > >> > >> Here are my initial thoughts: > >> > >> For the higher level api to work, we need the following support at > least. > >> > >> 1. The operators used in the higher level api should have concrete > >> implementations with all available input and output ports defined at the > >> abstract level. Have to think more about how multiple output ports will > >> play out. > >> 2. We need to define the objects that have method calls available on > them > >> that take operators as parameters. > >> > >> Eg: DtString can have method split and takes Splitter operator. And > >> Splitter operator should be abstract with input port type DtString and > >> output port type DtString. LineSplitter will be a concrete > implementation > >> of this operator. > >> > >> Regards, > >> Ashwin. > >> > >> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta < > >> [email protected]> wrote: > >> > >> > David, > >> > > >> > I can imagine that it boils down to something like these function > calls. > >> > > >> > DtString lines = readLines(new LineReader()); > >> > DtString words = lines.split(new LineSplitter()); > >> > DtNumber count = words.count(new Counter()); > >> > count.print(new ConoleOutputOperator()); > >> > > >> > Or > >> > > >> > readLines(new LineReader()) > >> > .split(new LineSplitter()) > >> > .count(new Counter()) > >> > .print(new ConoleOutputOperator()); > >> > > >> > > >> > which translates to > >> > > >> > Reader reader = dag.addOperator("ReadLines", new LineReader()); > >> > LineSplitter splitter = dag.addOperator("Split", new LineSplitter()); > >> > WordCounter counter = dag.addOperator("Count", new Counter()); > >> > ConsoleOutputOperator console = dag.addOperator("Print", new > >> > ConsoleOutputOperator()); > >> > > >> > dag.addStream("lines", reader.output, splitter.input); > >> > dag.addStream("words", splitter.output, counter.input); > >> > dag.addStream("count", counter.output, console.input); > >> > > >> > Here are my initial thoughts: > >> > > >> > For the higher level api to work, we need the following support at > >> least. > >> > > >> > 1. The operators used in the higher level api should have concrete > >> > implementations with all available input and output ports defined at > the > >> > abstract level. Have to think more about how multiple output ports > will > >> > play out. > >> > 2. We need to define the objects that have method calls available on > >> them > >> > that take operators as parameters. > >> > > >> > Eg: DtString can have method split and takes Splitter operator. And > >> > Splitter operator should be abstract with input port type DtString and > >> > output port type DtString. LineSplitter will be a concrete > >> implementation > >> > of this operator. > >> > > >> > Regards, > >> > Ashwin. > >> > > >> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <[email protected]> > >> wrote: > >> > > >> >> Hi fellow Apex developers: > >> >> > >> >> Apex has a comprehensive API for constructing DAG topologies for > >> streaming > >> >> applications, using operators, ports and streams. But this may seem > >> too > >> >> much for folks who just want to build simple applications, or just to > >> >> learn > >> >> about Apex. For example, when you compare the code to do word count > in > >> >> Apex with Spark Streaming or Flink, Apex requires much more code. > >> >> > >> >> Apex: > >> >> > >> >> > >> > https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main > >> >> > >> >> Spark Streaming: > >> >> > >> >> > >> > https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java > >> >> > >> >> Flink: > >> >> > >> >> > >> > https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java > >> >> > >> >> Note that their Scala versions are even simpler to use. > >> >> > >> >> The high-level requirements I have in mind is as follow: > >> >> > >> >> 1. A simple-to-use high-level API similar to what Spark Streaming and > >> >> Flink > >> >> have. And from the high-level API, the Apex engine will construct the > >> >> actual DAG topology at launch time. > >> >> > >> >> 2. The first language we will support is Java, but we will also want > to > >> >> support Scala and possibly Python at some point, so the high-level > API > >> >> should make it easy for implementing bindings for at least these two > >> >> languages. > >> >> > >> >> 3. We should be able to use the high-level API in Apex App Package > >> (apa) > >> >> file, so that dtcli can launch it just like a regular apa today. > >> >> > >> >> Please provide your ideas and thoughts on this topic. > >> >> > >> >> Thanks, > >> >> > >> >> David > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > Regards, > >> > Ashwin. > >> > > >> > >> > >> > >> -- > >> > >> Regards, > >> Ashwin. > >> > > > > >
