[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484712#comment-15484712
 ] 

Konstantinos Katsiapis commented on BEAM-570:
---------------------------------------------

According to the Avro specification, the required codecs are 'null' and 
'deflate', and the optional codecs are 'snappy'.
See: https://avro.apache.org/docs/1.8.1/spec.html

Python _AvroSource already supports 'null' and 'deflate'.
The following PR adds support for 'snappy': 
https://github.com/apache/incubator-beam/pull/946

[~altay], [~chamikara] You also mention that bzip2 should be supported (similar 
to how it's done for Dataflow Java?), but that doesn't seem to be part of the 
specification (mentioned above).

Should we limit the scope of this bug to just adding 'snappy', or is there 
precedence for supporting 'bzip2'?
Any pointers to the Java code that supports 'bzip2' so that we can get more 
background there?

Thanks,
Gus

> Update AvroSource to support more compression types
> ---------------------------------------------------
>
>                 Key: BEAM-570
>                 URL: https://issues.apache.org/jira/browse/BEAM-570
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to