[
https://issues.apache.org/jira/browse/AVRO-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338559#comment-17338559
]
Michael Olschimke commented on AVRO-1862:
-----------------------------------------
Hi guys, i cannot agree with Niels :(
Many similar tools support snappy (or otherwise) compressed Avro files, for
example:
* Impala:
[https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_avro.html]
* Snowflake DB:
[https://community.snowflake.com/s/question/0D50Z00006w5ksOSAQ/how-about-support-for-snappy-compression-in-avro-files]
* AWS Athena:
[https://stackoverflow.com/questions/58217715/is-snappy-compressed-avro-files-queryable-in-athena]
* SparkSQL:
[https://discuss.itversity.com/t/how-to-save-avro-file-with-snappy-compression/5241]
To be fair, some counter examples that do not support this:
* Presto
* Polybase:
[https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-external-file-formats-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15]
So, it would be great to reconsider this as we see this more and more in data
lakes to be used, it makes sense and this limits the usage of Apache Drill in
such scenarios.
Thank you kindly,
Mike
> AvroOutputFormat saves compressed avrò files without respecting codec's
> default extension
> -----------------------------------------------------------------------------------------
>
> Key: AVRO-1862
> URL: https://issues.apache.org/jira/browse/AVRO-1862
> Project: Apache Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.8.1
> Reporter: Piotr Wikieł
> Priority: Minor
> Labels: patch
> Fix For: 1.9.0
>
> Attachments: AVRO-1862-1.patch, AVRO-1862.patch
>
>
> Common pattern in naming compressed files is giving them extension derived
> from compression codec, for example: {{.gz}}, {{.zip}}, {{.bz2}}.
> {{AvroOutputFormat}} currently does not respect this convention.
> I've adapted some code from Hadoop's {{TextOutputFormat}} in
> backward-compatible manner adding following {{JobConf}} property:
> {{avro.mapred.output.extension.from-codec}} ({{boolean}}, default: {{false}})
> - when set to {{true}}, extension will be changed according to above rule.
> EDIT: Please take a look at first comment for an update. {{.gz.avro}},
> {{.snappy.avro}} will be an extension of the file when above property will be
> set to true.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)