[
https://issues.apache.org/jira/browse/BEAM-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896948#comment-15896948
]
Raghu Angadi commented on BEAM-1573:
------------------------------------
@peay,
There are two levels of solutions to deserializer (and serializer):
# Reasonable ways to use of custom Kafka deserializers & serializers
* This is very feasible now, including the case when you are reading from
multiple topics.
# Update to KafkaIO API to pass Kafka serializers directly to the Kafka
consumer.
* We might end up doing this, not exactly how you proposed, but rather
replacing coders with Kafka (de)serializers. There is no need to include both I
think.
* There is a discussion on Beam mailing lists about removing use of coders
directly in sources and other places and that might be right time to add this
support. (cc [~jkff])
Are you more interested 1 or 2?
One way to use any Kafka serializer (for (1)):
{code}
PCollection<KafkaRecord<byte[], byte[]> kafkaRecords = // Note that KafkaRecord
include topic name, partition etc.
pipeline
.apply(KafkaIO.<byte[] >read()
.withBootstrapServers("broker_1:9092,broker_2:9092")
.withTopics(ImmutableList.of("topic_a")));
kafkaRecords.apply( ParDo.of(new DoFn<KafkaRecord<byte[], byte[]>,
MyAvroRecord) {
private final Map<String, Object> config = // config
private transient Deserializer kafkaDeserializer;
@Setup
public void setup() {
kafkaDeserializer = new MyDeserializer();
kafkaDeserializer.configure(config) // kafka config (serializable map)
}
@ProcessElement
public void procesElement(Context context) {
MyAvroRecord record =
kafkaDeserializer.deserialize(context.element().getTopic(),
context.element().getValue())
context.output(record);
}
@TearDown
public void tearDown() {
kafkaDeserializer.close();
}
}))
{code}
> KafkaIO does not allow using Kafka serializers and deserializers
> ----------------------------------------------------------------
>
> Key: BEAM-1573
> URL: https://issues.apache.org/jira/browse/BEAM-1573
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-extensions
> Affects Versions: 0.4.0, 0.5.0
> Reporter: peay
> Assignee: Raghu Angadi
> Priority: Minor
>
> KafkaIO does not allow to override the serializer and deserializer settings
> of the Kafka consumer and producers it uses internally. Instead, it allows to
> set a `Coder`, and has a simple Kafka serializer/deserializer wrapper class
> that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the
> rest of the system. However, is there a reason to completely disallow to use
> custom Kafka serializers instead?
> This is a limitation when working with an Avro schema registry for instance,
> which requires custom serializers. One can write a `Coder` that wraps a
> custom Kafka serializer, but that means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's
> `Serializer` which gets the topic name as input. Using a `Coder` wrapper
> would require duplicating the output topic setting in the argument to
> `KafkaIO` and when building the wrapper, which is not elegant and error prone.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)