[
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245577#comment-16245577
]
Stephan Ewen edited comment on FLINK-6022 at 11/9/17 12:48 PM:
---------------------------------------------------------------
We are not serializing the schema in the Avro Serializer. If the Avro
Serializer is chosen, this is fixed.
I am wondering if the case is if one uses explicitly a "generic record" from
Avro as the exchange data type. That is not a good idea in the first place in
my opinion. In that case, isn't it possible that each generic record is
different and thus you always need a schema anyways?
I would honestly close this, because I assume the intention was around using
Avro's specific record mechanism and the "generic" mechanism (where we use the
ReflectDatumReader/Writer). Both should work well now.
was (Author: stephanewen):
We are not serializing the schema in the Avro Serializer. If the Avro
Serializer is chosen, this is fixed.
I am wondering if the case is if one uses explicitly a "generic record" from
Avro as the exchange data type. That is not a good idea in the first place in
my opinion. In that case, isn't it possible that each generic record is
different and thus you always need a schema anyways.
> Don't serialise Schema when serialising Avro GenericRecord
> ----------------------------------------------------------
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
> Issue Type: Improvement
> Components: Type Serialization System
> Reporter: Robert Metzger
> Assignee: Stephan Ewen
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by
> shipping the schema to each serializer through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but
> the performance will be much better.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)