[
https://issues.apache.org/jira/browse/APEXMALHAR-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310987#comment-15310987
]
Ilya Ganelin edited comment on APEXMALHAR-2099 at 6/1/16 7:53 PM:
------------------------------------------------------------------
The current implementation of the Apex Stream API (ApexStreamImpl.java)
supports the following functions:
- map
- flatMap
- filter
- reduce
- fold
On the Beam side, there is not a strict "API" as far as applying a
transformation. Instead, Beam defines a PTransform class which implements an
"apply" function that applies a given function (PTransform) to incoming data
represented as a PCollection. There are presently on the order of 40 different
transformations implemented for Beam:
https://github.com/apache/incubator-beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms
The analogs to the Stream API are:
Apex => Beam
map => ParDo
flatMap => FlatMapElements
filter => Filter
reduce => Combine (sort of)
In General, beam presently supports a much greater variety of transformations.
They also support different classes of transformation. For example, some
transformations are applied over a window, while others are applied on a
per-tuple basis. The windowing behavior can be explicitly specified by defining
a windowing strategy. Key limitations of the current Apex Stream API are that
it does not have any support for cross-stream interaction. Specifically,
operations like groupByKey or join are not currently defined within the scope
of the Apex Stream API and this is a serious limitation since it limits the
applications that can be built.
was (Author: ilganeli):
The current implementation of the Apex Stream API (ApexStreamImpl.java)
supports the following functions:
- map
- flatMap
- filter
- reduce
- fold
On the Beam side, there is not a strict "API" as far as transformation goes.
Instead, Beam defines a PTransform class which implements an "apply" function
that applies a given function to incoming data represented as a PCollection.
There are presently on the order of 40 different transformations implemented
for Beam:
https://github.com/apache/incubator-beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms
The analogs to the Stream API are:
Apex => Beam
map => ParDo
flatMap => FlatMapElements
filter => Filter
reduce => Combine (sort of)
In General, beam presently supports a much greater variety of transformations.
They also support different classes of transformation. For example, some
transformations are applied over a window, while others are applied on a
per-tuple basis. The windowing behavior can be explicitly specified by defining
a windowing strategy. Key limitations of the current Apex Stream API are that
it does not have any support for cross-stream interaction. Specifically,
operations like groupByKey or join are not currently defined within the scope
of the Apex Stream API and this is a serious limitation since it limits the
applications that can be built.
> Identify overlap between Beam API and existing Apex Stream API
> --------------------------------------------------------------
>
> Key: APEXMALHAR-2099
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2099
> Project: Apache Apex Malhar
> Issue Type: Sub-task
> Reporter: Ilya Ganelin
>
> There should be some overlap between the Beam API and the recently released
> Apex Stream API. This task captures the need to understand and document this
> overlap.
> AC:
> * A document or JIRA comment identifying which components of the Beam API are
> implement, similar, or absent within the Apex Stream API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)