[
https://issues.apache.org/jira/browse/BEAM-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788382#comment-16788382
]
Heejong Lee commented on BEAM-6683:
-----------------------------------
It's hard to create useful cross-language IO transforms unless we define a
common data type which can be used in any SDK. Most of Java IO transforms
produce a PCollection of their own data types (such as `ReadableFile` from
FileIO, `GenericRecord` from AvroIO, `ParseResult` from TikaIO, `RowResult`
from Kudu, `Result` from HBase, so on). We can possibly use Schema as a common
codec across SDKs but looks like it only supports Java SDK at this moment
(correct me if I'm wrong). Maybe it's not a problem for creating testcases
since I can manually convert those custom data types to a common type like
bytes array.
I will work on adding more testcases for (1) and then (3) in the coming weeks.
(2) has a dependency on BEAM-6747.
> Add an integration test suite for cross-language transforms for Flink runner
> ----------------------------------------------------------------------------
>
> Key: BEAM-6683
> URL: https://issues.apache.org/jira/browse/BEAM-6683
> Project: Beam
> Issue Type: Test
> Components: testing
> Reporter: Chamikara Jayalath
> Assignee: Heejong Lee
> Priority: Major
>
> We should add an integration test suite that covers following.
> (1) Currently available Java IO connectors that do not use UDFs work for
> Python SDK on Flink runner.
> (2) Currently available Python IO connectors that do not use UDFs work for
> Java SDK on Flink runner.
> (3) Currently available Java/Python pipelines work in a scalable manner for
> cross-language pipelines (for example, try 10GB, 100GB input for
> textio/avroio for Java and Python).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)