[
https://issues.apache.org/jira/browse/SPARK-27276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810024#comment-16810024
]
Bryan Cutler commented on SPARK-27276:
--------------------------------------
I think we should use 0.12.1, there was a bug fix ARROW-4582 that might affect
usage in Spark.
> Increase the minimum pyarrow version to 0.12.1
> ----------------------------------------------
>
> Key: SPARK-27276
> URL: https://issues.apache.org/jira/browse/SPARK-27276
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Affects Versions: 3.0.0
> Reporter: Bryan Cutler
> Priority: Major
>
> The current minimum version is 0.8.0, which is pretty ancient since Arrow has
> been moving fast and a lot has changed since this version. There are
> currently many workarounds checking for different versions or disabling
> specific functionality, and the code is getting ugly and difficult to
> maintain. Increasing the version will allow cleanup and upgrade the testing
> environment.
> This involves changing the pyarrow version in setup.py (currently at 0.8.0),
> updating Jenkins to test against the new version, code cleanup to remove
> workarounds from older versions. Newer versions of pyarrow have dropped
> support for Python 3.4, so it might be necessary to update to Python 3.5+ in
> Jenkins as well. Users would then need to ensure at least this version of
> pyarrow is installed on the cluster.
> There is also a 0.12.1 release, so I will need to check what bugs that fixed
> to see if that will be a better version.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]