Ofer Fridman created SPARK-25491: ------------------------------------ Summary: pandas_udf fails when using ArrayType(ArrayType(DoubleType())) Key: SPARK-25491 URL: https://issues.apache.org/jira/browse/SPARK-25491 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.1 Environment: Linux
python 2.7.9 pyspark 2.3.1 (also reproduces on pyspark 2.3.0) pyarrow 0.9.0 (working OK when using pyarrow 0.8.0) Reporter: Ofer Fridman After upgrading from pyarrow-0.8.0 to pyarrow-0.9.0 using pandas_udf (in PandasUDFType.GROUPED_MAP), results in an error: {quote}Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:158) ... 24 more {quote} The problem occurs only when using complex type like ArrayType(ArrayType(DoubleType())) usege of ArrayType(DoubleType()) did not reproduce this issue. here is a simple example to reproduce this issue: {quote}import pandas as pd import numpy as np from pyspark.sql import SparkSession from pyspark.context import SparkContext, SparkConf from pyspark.sql.types import * import pyspark.sql.functions as sprk_func sp_conf = SparkConf().setAppName("stam").setMaster("local[1]").set('spark.driver.memory','4g') sc = SparkContext(conf=sp_conf) spark = SparkSession(sc) pd_data = pd.DataFrame(\{'id':(np.random.rand(20)*10).astype(int)}) data_df = spark.createDataFrame(pd_data,StructType([StructField('id', IntegerType(), True)])) @sprk_func.pandas_udf(StructType([StructField('mat', ArrayType(ArrayType(DoubleType())), True)]), sprk_func.PandasUDFType.GROUPED_MAP) def return_mat_group(group): pd_data = pd.DataFrame(\{'mat': np.random.rand(7, 4, 4).tolist()}) return pd_data data_df.groupby(data_df.id).apply(return_mat_group).show(){quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org