[
https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531516#comment-15531516
]
Jeffrey Payne commented on BEAM-55:
-----------------------------------
We too prefer to use binary file formats like Avro or Parquet, for many
reasons, including automatic compression handling. Unfortunately, we have
several existing SLAs with clients that necessitate compressed CSV output, some
even require a *single compressed CSV file*, ugh. What they do with the file
once it's out of our hands is their problem :)
I'll read through the contribution guide, fork beam, and submit a PR. Thanks
again for the direction!
> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
> Key: BEAM-55
> URL: https://issues.apache.org/jira/browse/BEAM-55
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Daniel Halperin
> Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option
> for compressing its output.
> In general, we discourage compression because it limits or blocks scalably
> reading from a file in parallel. However, users may want it -- so we should
> support the option (with appropriate warnings).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)