[ https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531516#comment-15531516 ]
Jeffrey Payne commented on BEAM-55: ----------------------------------- We too prefer to use binary file formats like Avro or Parquet, for many reasons, including automatic compression handling. Unfortunately, we have several existing SLAs with clients that necessitate compressed CSV output, some even require a *single compressed CSV file*, ugh. What they do with the file once it's out of our hands is their problem :) I'll read through the contribution guide, fork beam, and submit a PR. Thanks again for the direction! > Allow users to compress FileBasedSink output files > -------------------------------------------------- > > Key: BEAM-55 > URL: https://issues.apache.org/jira/browse/BEAM-55 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core > Reporter: Daniel Halperin > Priority: Minor > > FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option > for compressing its output. > In general, we discourage compression because it limits or blocks scalably > reading from a file in parallel. However, users may want it -- so we should > support the option (with appropriate warnings). -- This message was sent by Atlassian JIRA (v6.3.4#6332)