[
https://issues.apache.org/jira/browse/BEAM-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231819#comment-17231819
]
Brian Hulette commented on BEAM-11211:
--------------------------------------
Lets hold off on making pyarrow 2.x the default in the docker image. There's a
potential data loss bug in arrow 2.0.0 and it looks like a 2.0.1 release is
coming soon:
https://lists.apache.org/thread.html/r747596de9b4b3c1cd12624aa9d0827becd5da5e716369e7a44b6b626%40%3Cdev.arrow.apache.org%3E
> Support multiple major pyarrow versions
> ---------------------------------------
>
> Key: BEAM-11211
> URL: https://issues.apache.org/jira/browse/BEAM-11211
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Time Spent: 1h
> Remaining Estimate: 0h
>
> We should support using ParquetIO with multiple pyarrow versions, up to 2.x.
> Specific actions:
> [X] Change pyarrow requirement from >=0.15.1,<0.18.0 to >=0.15.1,<3.0.0
> [X] There's a limitation in 1.x where it can't write LZ4 compression, we
> should catch attempts to do this at construction time and raise a useful
> error (See ARROW-9424).
> [ ] Add the ability to test with different pyarrow versions and run in
> PostCommit to verify.
> [ ] Update pyarrow in base_image_requirements.txt to 2.x
--
This message was sent by Atlassian Jira
(v8.3.4#803005)