[
https://issues.apache.org/jira/browse/FLINK-9411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487241#comment-16487241
]
Stephan Ewen commented on FLINK-9411:
-------------------------------------
I would strongly suggest to write a comprehensive design for this up front,
otherwise there is a big chance that the contribution cannot be added.
Parquet and similar files require writing of larger batches (for efficient
columnarization) and this collides with the bucketing sink's assumption that it
can flush()/persist at any checkpoints. We first need a plan/design to handle
these conflicting requirements - for example, will the compression only happen
on rolling, or always during writing?
Me and [~aljoscha] and [~kkl0u] are also looking at a new version of the
Bucketing Sink that fixes a bunch of shortcomings, like making it work with
Flink's file systems, making it work properly with S3 (eventual consistency),
and support for non-row-wise formats - there will probably be a design doc for
that coming in the next weeks.
> Support parquet rolling sink writer
> -----------------------------------
>
> Key: FLINK-9411
> URL: https://issues.apache.org/jira/browse/FLINK-9411
> Project: Flink
> Issue Type: New Feature
> Components: filesystem-connector
> Reporter: mingleizhang
> Assignee: Triones Deng
> Priority: Major
>
> Like support orc rolling sink writer in FLINK-9407 , we should also support
> parquet rolling sink writer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)