[ 
https://issues.apache.org/jira/browse/BEAM-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231819#comment-17231819
 ] 

Brian Hulette commented on BEAM-11211:
--------------------------------------

Lets hold off on making pyarrow 2.x the default in the docker image. There's a 
potential data loss bug in arrow 2.0.0 and it looks like a 2.0.1 release is 
coming soon: 
https://lists.apache.org/thread.html/r747596de9b4b3c1cd12624aa9d0827becd5da5e716369e7a44b6b626%40%3Cdev.arrow.apache.org%3E

> Support multiple major pyarrow versions
> ---------------------------------------
>
>                 Key: BEAM-11211
>                 URL: https://issues.apache.org/jira/browse/BEAM-11211
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should support using ParquetIO with multiple pyarrow versions, up to 2.x.
> Specific actions:
> [X] Change pyarrow requirement from >=0.15.1,<0.18.0 to >=0.15.1,<3.0.0
> [X] There's a limitation in 1.x where it can't write LZ4 compression, we 
> should catch attempts to do this at construction time and raise a useful 
> error (See ARROW-9424).
> [  ] Add the ability to test with different pyarrow versions and run in 
> PostCommit to verify.
> [  ] Update pyarrow in base_image_requirements.txt to 2.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to