[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924180#comment-15924180
 ] 

Robert Metzger commented on FLINK-6022:
---------------------------------------

There is actually a way to registering anything serializable with the execution 
config, the "setGlobalJobParameters(GlobalJobParameters params)". The main use 
case for that is showing the job parameters in the web frontend (the 
ParameterTool has support for that as well).
Also, the GlobalJobParameters are accessible everywhere in the user code (when 
using the Rich* variants).
Having said all this, I would NOT recommend using the GlobalJobParameters for 
the Avro serializer.

The much more appropriate place for shipping some serialized data (that is 
specific to a serializer) from the user APIs to the cluster are the 
TypeInformations.

By putting the schema of the generic records into the {{AvroTypeInfo}} (or 
something similar for GenericAvroRecords), you'll have the schema available on 
all serializers.

> Improve support for Avro GenericRecord
> --------------------------------------
>
>                 Key: FLINK-6022
>                 URL: https://issues.apache.org/jira/browse/FLINK-6022
>             Project: Flink
>          Issue Type: Improvement
>          Components: Type Serialization System
>            Reporter: Robert Metzger
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to