[ 
https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531312#comment-15531312
 ] 

Daniel Halperin commented on BEAM-55:
-------------------------------------

Note that there is a good reason this was not originally supported: Compressing 
output files is generally terrible for downstream processing. Most consumers of 
files perform very poorly when reading from them (Examples: Dataflow and Google 
BigQuery are both unable to parallelize reads from compressed files).

At Google, we highly discourage compressed data but prefer, e.g., 
block-compressed formats like Avro that combine compression and the ability to 
seek/split/parallelize reading. AvroIO DOES support compression.

> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
>                 Key: BEAM-55
>                 URL: https://issues.apache.org/jira/browse/BEAM-55
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option 
> for compressing its output.
> In general, we discourage compression because it limits or blocks scalably 
> reading from a file in parallel. However, users may want it -- so we should 
> support the option (with appropriate warnings).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to