[ 
https://issues.apache.org/jira/browse/FLINK-11347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745008#comment-16745008
 ] 

Stephan Ewen commented on FLINK-11347:
--------------------------------------

Forwarding my comment from the pull request:

The schema must be serializable, hence we convert it to a string and back.
The schema is in the closure of the factory, which itself is part of the user 
code that is shipped for distributed execution, hence the requirement to be 
serializable.

The parsing also happens just once when the writer is created, so my assumption 
is that the cost is acceptable.

I would close this issue, because the solution here is not possible.
Please reopen the issue, if you disagree and would like t pursue this further.

> Optimize the ParquetAvroWriters factory
> ---------------------------------------
>
>                 Key: FLINK-11347
>                 URL: https://issues.apache.org/jira/browse/FLINK-11347
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats
>    Affects Versions: 1.7.1
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the ParquetAvroWriters the schema is first serialized to a string, and 
> then back to a Schema, which is quite expensive to do. Therefore it makes 
> sense to pass the schema to the writer directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to