[
https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749744#comment-16749744
]
Fokko Driesprong commented on FLINK-11401:
------------------------------------------
Thanks for the comment [~StephanEwen]
The RollOnCheckpoint behavior works very well for our use case, which is just
ETL'ing the data from Kafka to a bucket. Since we're using an object store FS
Backend (GCS), the renaming constant renaming of the files to `.in-progress` to
`.pending` to `.avro` are far from optimal since renaming is very expensive. On
HDFS this is a constant and atomic logic operation, in contrast when using an
object store where this implies copying the whole file.
In the near future, we'll open a PR for the Avro writer, implementing the
BulkWriter. Since Avro is still in a container (we want to include the schema
in the header of the file), we still need to write a header, before writing the
actual rows. Writing this header first would require changing some interfaces.
> Allow compression on ParquetBulkWriter
> --------------------------------------
>
> Key: FLINK-11401
> URL: https://issues.apache.org/jira/browse/FLINK-11401
> Project: Flink
> Issue Type: Improvement
> Components: Batch Connectors and Input/Output Formats
> Affects Versions: 1.7.1
> Reporter: Fokko Driesprong
> Assignee: Fokko Driesprong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.8.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)