Marian Dvorsky created BEAM-3874: ------------------------------------ Summary: Switch AvroIO sink default codec to Snappy Key: BEAM-3874 URL: https://issues.apache.org/jira/browse/BEAM-3874 Project: Beam Issue Type: Improvement Components: io-java-avro Reporter: Marian Dvorsky Assignee: Eugene Kirpichov
AvroIO currently uses [CodecFactory|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%2523b8636ed8a0357a3a3806fb8ad152a1e38d3b4fa39a6a66d189c040aee9687823&gsn=CodecFactory&ct=xref_usages].[deflateCodec|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%25239fc62def2276bb77cc0f71b21660540e246046da139bfed9b0f33c7f8dbb4550&gsn=deflateCodec&ct=xref_usages](6) as the default codec for writes. That compresses well, but is quite expensive. Snappy codec offers sparser, but much faster compression, and is typically a better CPU/storage tradeoff except for very long lived files. We should consider switching the default to Snappy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)