[
https://issues.apache.org/jira/browse/SPARK-34771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Darcy Shen updated SPARK-34771:
-------------------------------
Description:
(spark) ➜ spark git:(SPARK_34771) ✗ bin/pyspark
Python 3.8.8 (default, Feb 24 2021, 13:46:16)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT
/_/
Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
Spark context Web UI available at http://172.30.0.226:4040
Spark context available as 'sc' (master = local[*], app id =
local-1615994008526).
SparkSession available as 'spark'.
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
21/03/17 23:13:31 WARN SQLConf: The SQL config
'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may
be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled'
instead of it.
>>> from pyspark.testing.sqlutils import ExamplePoint
>>> import pandas as pd
>>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2,
>>> 2)])})
>>> df = spark.createDataFrame(pdf)
/Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332:
UserWarning: createDataFrame attempted Arrow optimization because
'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by
the reason below:
Could not convert (1,1) with type ExamplePoint: did not recognize Python
value type when inferring an Arrow data type
Attempting non-optimization as
'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
warnings.warn(msg)
>>>
with `spark.sql.execution.arrow.enabled` = false, the above snippet works fine.
was:
(spark) ➜ spark git:(SPARK_34771) ✗ bin/pyspark
Python 3.8.8 (default, Feb 24 2021, 13:46:16)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT
/_/
Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
Spark context Web UI available at http://172.30.0.226:4040
Spark context available as 'sc' (master = local[*], app id =
local-1615994008526).
SparkSession available as 'spark'.
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
21/03/17 23:13:31 WARN SQLConf: The SQL config
'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may
be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled'
instead of it.
>>> from pyspark.testing.sqlutils import ExamplePoint
>>> import pandas as pd
>>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1), ExamplePoint(2,
>>> 2)])})
>>> df = spark.createDataFrame(pdf)
/Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332:
UserWarning: createDataFrame attempted Arrow optimization because
'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by
the reason below:
Could not convert (1,1) with type ExamplePoint: did not recognize Python
value type when inferring an Arrow data type
Attempting non-optimization as
'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
warnings.warn(msg)
>>>
> Support UDT for Pandas
> ----------------------
>
> Key: SPARK-34771
> URL: https://issues.apache.org/jira/browse/SPARK-34771
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 3.0.2, 3.1.1
> Reporter: Darcy Shen
> Priority: Major
>
> (spark) ➜ spark git:(SPARK_34771) ✗ bin/pyspark
> Python 3.8.8 (default, Feb 24 2021, 13:46:16)
> [Clang 10.0.0 ] :: Anaconda, Inc. on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
> 21/03/17 23:13:27 WARN NativeCodeLoader: Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT
> /_/
> Using Python version 3.8.8 (default, Feb 24 2021 13:46:16)
> Spark context Web UI available at http://172.30.0.226:4040
> Spark context available as 'sc' (master = local[*], app id =
> local-1615994008526).
> SparkSession available as 'spark'.
> >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
> 21/03/17 23:13:31 WARN SQLConf: The SQL config
> 'spark.sql.execution.arrow.enabled' has been deprecated in Spark v3.0 and may
> be removed in the future. Use 'spark.sql.execution.arrow.pyspark.enabled'
> instead of it.
> >>> from pyspark.testing.sqlutils import ExamplePoint
> >>> import pandas as pd
> >>> pdf = pd.DataFrame({'point': pd.Series([ExamplePoint(1, 1),
> >>> ExamplePoint(2, 2)])})
> >>> df = spark.createDataFrame(pdf)
> /Users/da/github/apache/spark/python/pyspark/sql/pandas/conversion.py:332:
> UserWarning: createDataFrame attempted Arrow optimization because
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed
> by the reason below:
> Could not convert (1,1) with type ExamplePoint: did not recognize Python
> value type when inferring an Arrow data type
> Attempting non-optimization as
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
> warnings.warn(msg)
> >>>
> with `spark.sql.execution.arrow.enabled` = false, the above snippet works
> fine.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]