[ 
https://issues.apache.org/jira/browse/FLINK-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487241#comment-16487241
 ] 

Stephan Ewen commented on FLINK-9411:
-------------------------------------

I would strongly suggest to write a comprehensive design for this up front, 
otherwise there is a big chance that the contribution cannot be added.

Parquet and similar files require writing of larger batches (for efficient 
columnarization) and this collides with the bucketing sink's assumption that it 
can flush()/persist at any checkpoints. We first need a plan/design to handle 
these conflicting requirements - for example, will the compression only happen 
on rolling, or always during writing?

Me and [~aljoscha] and [~kkl0u] are also looking at a new version of the 
Bucketing Sink that fixes a bunch of shortcomings, like making it work with 
Flink's file systems, making it work properly with S3 (eventual consistency), 
and support for non-row-wise formats - there will probably be a design doc for 
that coming in the next weeks.

> Support parquet rolling sink writer
> -----------------------------------
>
>                 Key: FLINK-9411
>                 URL: https://issues.apache.org/jira/browse/FLINK-9411
>             Project: Flink
>          Issue Type: New Feature
>          Components: filesystem-connector
>            Reporter: mingleizhang
>            Assignee: Triones Deng
>            Priority: Major
>
> Like support orc rolling sink writer in FLINK-9407 , we should also support 
> parquet rolling sink writer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to