[ 
https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=286552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-286552
 ]

ASF GitHub Bot logged work on BEAM-7029:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Aug/19 09:03
            Start Date: 01/Aug/19 09:03
    Worklog Time Spent: 10m 
      Work Description: mxm commented on issue #8251: [BEAM-7029] Add 
KafkaIO.Read as external transform
URL: https://github.com/apache/beam/pull/8251#issuecomment-517196805
 
 
   Thanks for trying out the Kafka consumer. There are limitations for the 
consumer to work correctly. The biggest issue is the structure of KafkaIO 
itself, which uses a combination of the source interface and DoFns to generate 
the desired output. The problem is that the source interface is natively 
translated by Flink to support unbounded sources in portability, while the DoFn 
runs in a Java environment. To transfer data between the two a coder needs to 
be involved. It happens to be that the initial read does not immediately drop 
the KafakRecord structure which does not work together well with our current 
assumption of only supporting "standard coders" present in all SDKs.
   
   There are several possible solutions:
   1. Make the DoFn which drops the KafkaRecordCoder a native Java transform in 
the Flink Runner
   2. Modify KafkaIO to immediately drop the KafkaRecord structure
   3. Add the KafkaRecordCoder to all SDKs
   4. Add a generic coder, e.g. AvroCoder to all SDKs
   
   For a workaround which uses (3), please see this patch which is not a proper 
fix but adds KafkaRecordCoder to the SDK such that it can be used encode/decode 
records: 
https://github.com/mxm/beam/commit/b31cf99c75b3972018180d8ccc7e73d311f4cfed 
   
   I'm planning to fix the issue in the coming weeks.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 286552)
    Time Spent: 14h 10m  (was: 14h)

> Support KafkaIO to be configured externally for use with other SDKs
> -------------------------------------------------------------------
>
>                 Key: BEAM-7029
>                 URL: https://issues.apache.org/jira/browse/BEAM-7029
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-kafka, runner-flink, sdk-py-core
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>             Fix For: 2.13.0
>
>          Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> As of BEAM-6730, we can externally configure existing transforms from SDKs. 
> We should add more useful transforms then just {{GenerateSequence}}. 
> {{KafkaIO}} is a good candidate.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to