[ 
https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=288065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-288065
 ]

ASF GitHub Bot logged work on BEAM-7029:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Aug/19 16:51
            Start Date: 02/Aug/19 16:51
    Worklog Time Spent: 10m 
      Work Description: manuelaguilar commented on issue #8251: [BEAM-7029] Add 
KafkaIO.Read as external transform
URL: https://github.com/apache/beam/pull/8251#issuecomment-517771688
 
 
   @mxm The issue was related to invalid docker image names. 
   
   My first issue (invalid docker image name) was solved by creating a plain 
Unix account and building all the images there. 
   
   A small issue I noticed after was when deploying the portable runner in 
Linux for a pipeline that runs multiple languages (Java + Python).  The flink 
server does not have the USER environment variable defined, so it can't find 
the right payload image to use on the second transform and probably onwards. I 
hardcoded the USER environment variable in the flink server Dockerfile and it 
now finds the right image for the right payload.
   
   The patch works and I'm able to consume Kafka payload using the portable 
runner. 
   
   Another thing I noticed is that although the deserializer I specified in the 
ReadFromKafka transform is for ByteArray (both for key and value), the Python 
pipeline is consuming it as a string without further conversion.
   
   Would it make sense to have the KafkaRecord metadata as an option in the 
portable pipeline? (to be able to access partitions, offsets, timestamps, and 
other KafkaRecord properties).
   
   I'll continue running some performance tests to see how Flink scales up with 
a few million records in the portability framework.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 288065)
    Time Spent: 16h  (was: 15h 50m)

> Support KafkaIO to be configured externally for use with other SDKs
> -------------------------------------------------------------------
>
>                 Key: BEAM-7029
>                 URL: https://issues.apache.org/jira/browse/BEAM-7029
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-kafka, runner-flink, sdk-py-core
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>             Fix For: 2.13.0
>
>          Time Spent: 16h
>  Remaining Estimate: 0h
>
> As of BEAM-6730, we can externally configure existing transforms from SDKs. 
> We should add more useful transforms then just {{GenerateSequence}}. 
> {{KafkaIO}} is a good candidate.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to