Markus Tretzmüller created SPARK-31704: ------------------------------------------
Summary: PandasUDFType.GROUPED_AGG with Java 11 Key: SPARK-31704 URL: https://issues.apache.org/jira/browse/SPARK-31704 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.0 Environment: java jdk: 11 python: 3.7 Reporter: Markus Tretzmüller Running the example from the [docs|https://spark.apache.org/docs/3.0.0-preview2/api/python/pyspark.sql.html#module-pyspark.sql.functions] gives an error with java 11. It works with java 8. {code:python} import findspark findspark.init('/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7') from pyspark.sql.functions import pandas_udf, PandasUDFType from pyspark.sql import Window from pyspark.sql import SparkSession if __name__ == '__main__': spark = SparkSession \ .builder \ .appName('test') \ .getOrCreate() df = spark.createDataFrame( [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v")) @pandas_udf("double", PandasUDFType.GROUPED_AGG) def mean_udf(v): return v.mean() w = (Window.partitionBy('id') .orderBy('v') .rowsBetween(-1, 0)) df.withColumn('mean_v', mean_udf(df['v']).over(w)).show() {code} {noformat} File "/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o81.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 7.0 failed 1 times, most recent failure: Lost task 44.0 in stage 7.0 (TID 37, 131.130.32.15, executor driver): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available at io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473) at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243) at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233) at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245) at org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222) at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:240) at org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:132) at org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:120) at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:94) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:101) at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:373) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:213) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org