Marian Dvorsky created BEAM-3874:
------------------------------------

             Summary: Switch AvroIO sink default codec to Snappy
                 Key: BEAM-3874
                 URL: https://issues.apache.org/jira/browse/BEAM-3874
             Project: Beam
          Issue Type: Improvement
          Components: io-java-avro
            Reporter: Marian Dvorsky
            Assignee: Eugene Kirpichov


AvroIO currently uses 
[CodecFactory|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%2523b8636ed8a0357a3a3806fb8ad152a1e38d3b4fa39a6a66d189c040aee9687823&gsn=CodecFactory&ct=xref_usages].[deflateCodec|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%25239fc62def2276bb77cc0f71b21660540e246046da139bfed9b0f33c7f8dbb4550&gsn=deflateCodec&ct=xref_usages](6)
 as the default codec for writes.

That compresses well, but is quite expensive.

Snappy codec offers sparser, but much faster compression, and is typically a 
better CPU/storage tradeoff except for very long lived files. 

We should consider switching the default to Snappy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to