[
https://issues.apache.org/jira/browse/FLINK-33058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766109#comment-17766109
]
Dale Lane commented on FLINK-33058:
-----------------------------------
hi [~rskraba] - thanks very much, a review would be much appreciated.
As for use cases, I should give some context. I work for IBM - we sell a Kafka
distribution with a schema registry that comes with serdes clients offering
both binary and JSON-encoded Avro support. As a part of this, I've worked with
many customers who use and value JSON-encoding.
As you suggest, sometimes this is a temporary thing, related to the phase of a
project - I've seen some customers who will use JSON-encoding during
development, and when they're ready to go into test/prod phases they flip the
switch to binary-encoding.
However, there have also been times where I've seen customers use JSON-encoding
even in production - generally where the topic throughput is low enough that
any performance issues are outweighed by the benefits of greater readability
and compatibility that JSON-encoding offers.
Don't get me wrong, I don't dispute at all that binary-encoding is the more
common choice, and comes with major network and disk usage improvements - so it
makes sense that Flink would've started with that. But I would love to enable
my customers to use Flink with their JSON-encoded Avro topics in the same way
that they're able to use other tools, which is what prompted me to offer the
pull request.
> Support for JSON-encoded Avro
> -----------------------------
>
> Key: FLINK-33058
> URL: https://issues.apache.org/jira/browse/FLINK-33058
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Reporter: Dale Lane
> Priority: Minor
> Labels: avro, flink, flink-formats, pull-request-available
>
> Avro supports two serialization encoding methods: binary and JSON
> cf. [https://avro.apache.org/docs/1.11.1/specification/#encodings]
> flink-avro currently has a hard-coded assumption that Avro data is
> binary-encoded (and cannot process Avro data that has been JSON-encoded).
> I propose adding a new optional format option to flink-avro: *avro.encoding*
> It will support two options: 'binary' and 'json'.
> It unset, it will default to 'binary' to maintain compatibility/consistency
> with current behaviour.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)