lindong28 opened a new pull request, #230:
URL: https://github.com/apache/flink-ml/pull/230
## What is the purpose of the change
Add util methods that allow algorithm developers to co-group two DataStreams
with the same semantics and similar performance as `DataSet#coGroup(...)`
Here are the results of running the benchmark specified in FLINK-31753's
JIRA description:
- DataSet#coGroup takes 27.6 seconds.
- DataStreamUtils#coGroup takes 31.5 seconds.
The DataStream is roughly 12.3% slower than DataSet. The performance
difference should be negligible for real-word applications whose co-group
function is non-trivial.
## Brief change log
Added the static method `DataStreamUtils#coGroup(...)`.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? JavaDocs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]