[ 
https://issues.apache.org/jira/browse/BEAM-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788382#comment-16788382
 ] 

Heejong Lee commented on BEAM-6683:
-----------------------------------

It's hard to create useful cross-language IO transforms unless we define a 
common data type which can be used in any SDK. Most of Java IO transforms 
produce a PCollection of their own data types (such as `ReadableFile` from 
FileIO, `GenericRecord` from AvroIO, `ParseResult` from TikaIO, `RowResult` 
from Kudu, `Result` from HBase, so on). We can possibly use Schema as a common 
codec across SDKs but looks like it only supports Java SDK at this moment 
(correct me if I'm wrong). Maybe it's not a problem for creating testcases 
since I can manually convert those custom data types to a common type like 
bytes array.

I will work on adding more testcases for (1) and then (3) in the coming weeks. 
(2) has a dependency on BEAM-6747.

> Add an integration test suite for cross-language transforms for Flink runner
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-6683
>                 URL: https://issues.apache.org/jira/browse/BEAM-6683
>             Project: Beam
>          Issue Type: Test
>          Components: testing
>            Reporter: Chamikara Jayalath
>            Assignee: Heejong Lee
>            Priority: Major
>
> We should add an integration test suite that covers following.
> (1) Currently available Java IO connectors that do not use UDFs work for 
> Python SDK on Flink runner.
> (2) Currently available Python IO connectors that do not use UDFs work for 
> Java SDK on Flink runner.
> (3) Currently available Java/Python pipelines work in a scalable manner for 
> cross-language pipelines (for example, try 10GB, 100GB input for 
> textio/avroio for Java and Python). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to