[ https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531312#comment-15531312 ]
Daniel Halperin commented on BEAM-55: ------------------------------------- Note that there is a good reason this was not originally supported: Compressing output files is generally terrible for downstream processing. Most consumers of files perform very poorly when reading from them (Examples: Dataflow and Google BigQuery are both unable to parallelize reads from compressed files). At Google, we highly discourage compressed data but prefer, e.g., block-compressed formats like Avro that combine compression and the ability to seek/split/parallelize reading. AvroIO DOES support compression. > Allow users to compress FileBasedSink output files > -------------------------------------------------- > > Key: BEAM-55 > URL: https://issues.apache.org/jira/browse/BEAM-55 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core > Reporter: Daniel Halperin > Priority: Minor > > FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option > for compressing its output. > In general, we discourage compression because it limits or blocks scalably > reading from a file in parallel. However, users may want it -- so we should > support the option (with appropriate warnings). -- This message was sent by Atlassian JIRA (v6.3.4#6332)