[
https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792451#comment-16792451
]
Fabian Hueske commented on FLINK-11818:
---------------------------------------
I can see that such a function is valuable. However, I also think that starting
external processes is performance sensitive and can also depend on the
scheduling / availability of software. Hence, I would not make it a first-class
API (i.e., add it to DataSetUtils).
When the feature is stable, we can check if the function is popular enough to
move it to DataSet.
> Provide pipe transformation function for DataSet API
> ----------------------------------------------------
>
> Key: FLINK-11818
> URL: https://issues.apache.org/jira/browse/FLINK-11818
> Project: Flink
> Issue Type: Improvement
> Components: API / DataSet
> Reporter: vinoyang
> Assignee: vinoyang
> Priority: Major
>
> We have some business requirements that require the data handled by Flink to
> interact with some external programs (such as Python/Perl/shell scripts).
> There is no such function in the existing DataSet API, although it can be
> implemented by the map function, but it is not concise. It would be helpful
> if we could provide a pipe[1] function like Spark.
> [1]:
> https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)