[ 
https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531516#comment-15531516
 ] 

Jeffrey Payne commented on BEAM-55:
-----------------------------------

We too prefer to use binary file formats like Avro or Parquet, for many 
reasons, including automatic compression handling.  Unfortunately, we have 
several existing SLAs with clients that necessitate compressed CSV output, some 
even require a *single compressed CSV file*, ugh.  What they do with the file 
once it's out of our hands is their problem :)

I'll read through the contribution guide, fork beam, and submit a PR.  Thanks 
again for the direction!

> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
>                 Key: BEAM-55
>                 URL: https://issues.apache.org/jira/browse/BEAM-55
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option 
> for compressing its output.
> In general, we discourage compression because it limits or blocks scalably 
> reading from a file in parallel. However, users may want it -- so we should 
> support the option (with appropriate warnings).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to