[ 
https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=225339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225339
 ]

ASF GitHub Bot logged work on BEAM-7029:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Apr/19 23:34
            Start Date: 09/Apr/19 23:34
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on issue #8251: [BEAM-7029] Add 
KafkaIO.Read as external transform
URL: https://github.com/apache/beam/pull/8251#issuecomment-481478094
 
 
   Thanks @mxm and @chamikaramj for the links. I read through them and have a 
better understanding of the details.
   
   Using the decoded PCollection<keyType, valueType> for KafkaIO.Read will 
cover many user usecases but from experience with the GCP pubsub native source 
for Dataflow and the Python SDK users always wanted to get additional 
attributes from the pubsub message and only passing through the "data" meant 
that this didn't satisfy what many users wanted (and exposing a few additional 
attributes at a time wasn't great). This was easy for pubsub since they used 
proto as the canonical wire format. As a future follow-up it may be useful to 
also expose a Java KafkaIO.Read that produces 
org.apache.kafka.common.record.Record encoded as a byte[]. This would mean that 
the output type of KafkaIO.Read for the cross language transform is 
PCollection<byte[]> and push all the decoding logic into the Python SDK. This 
would mean that the user could use any "deserializer" they want but this puts a 
greater burden on the language that wants to consume the records since they 
will need to be able to decode such messages. Many times this additional logic 
in the downstream SDK isn't difficult to implement since you can rely on a 
language specific source library to do the parsing. Alternatively, if there 
aren't many language specific libraries for the source format, it may be wise 
to produce an intermediate format such as proto/json/... (which can generate 
language specific bindings or are very well supported in almost all languages) 
which sends all the data across.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 225339)
    Time Spent: 5h 10m  (was: 5h)

> Support KafkaIO to be configured externally for use with other SDKs
> -------------------------------------------------------------------
>
>                 Key: BEAM-7029
>                 URL: https://issues.apache.org/jira/browse/BEAM-7029
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-kafka, runner-flink, sdk-py-core
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> As of BEAM-6730, we can externally configure existing transforms from SDKs. 
> We should add more useful transforms then just {{GenerateSequence}}. 
> {{KafkaIO}} is a good candidate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to