[ 
https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=225835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-225835
 ]

ASF GitHub Bot logged work on BEAM-7029:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Apr/19 20:38
            Start Date: 10/Apr/19 20:38
    Worklog Time Spent: 10m 
      Work Description: mxm commented on issue #8251: [BEAM-7029] Add 
KafkaIO.Read as external transform
URL: https://github.com/apache/beam/pull/8251#issuecomment-481852724
 
 
   @lukecwik I think your experience with Pubsub on Dataflow is a good reminder 
of the requirements users have when reading from a message queue. For simple 
use cases, only having the data might be fine but ultimately users will ask for 
the metadata (partition id, partition offset, timestamp). Defining a standard 
coder for this seems inevitable.
   
   For now, going with the current approach here to just ship the data seems to 
be fine. The `KV<byte[], byte[]>`, coder agnostic way makes sense to me because 
it gives users freedom to implement their own encoding. Note that this is 
already possible if users configure the `ByteArrayDeserializer`.
   
   I'm wondering whether we should keep the mapping from Kafka Deserializer to 
standard coders. Most Kafka users are familiar with Java and are used to Kafka 
Deserializers. The obvious drawback of this approach is that we need to 
maintain this mapping and we can't support all Deserializers. We could make 
`KV<byte[], byte[]>` the default and infer types of if Deserializers if they 
are provided by the user. This might be a more flexible approach.
   
   @lukecwik @chamikaramj WDYT?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 225835)
    Time Spent: 7.5h  (was: 7h 20m)

> Support KafkaIO to be configured externally for use with other SDKs
> -------------------------------------------------------------------
>
>                 Key: BEAM-7029
>                 URL: https://issues.apache.org/jira/browse/BEAM-7029
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-kafka, runner-flink, sdk-py-core
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>          Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> As of BEAM-6730, we can externally configure existing transforms from SDKs. 
> We should add more useful transforms then just {{GenerateSequence}}. 
> {{KafkaIO}} is a good candidate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to