[
https://issues.apache.org/jira/browse/SPARK-41193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-41193.
----------------------------------
Fix Version/s: 3.4.0
Resolution: Fixed
Issue resolved by pull request 38704
[https://github.com/apache/spark/pull/38704]
> Ignore `collect data with single partition larger than 2GB bytes array limit`
> in `DatasetLargeResultCollectingSuite` as default
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-41193
> URL: https://issues.apache.org/jira/browse/SPARK-41193
> Project: Spark
> Issue Type: Bug
> Components: Tests
> Affects Versions: 3.4.0
> Reporter: Yang Jie
> Assignee: Yang Jie
> Priority: Major
> Fix For: 3.4.0
>
>
> Test this suite with **Java 8/11/17** on Linux/MacOS On Apple Silicon with
> following commands:
> - Maven:
> ```
> build/mvn clean install -DskipTests -pl sql/core -am
> build/mvn clean test -pl sql/core -Dtest=none
> -DwildcardSuites=org.apache.spark.sql.DatasetLargeResultCollectingSuite
> ```
> and
>
> ```
> dev/change-scala-version.sh 2.13
> build/mvn clean install -DskipTests -pl sql/core -am -Pscala-2.13
> build/mvn clean test -pl sql/core -Pscala-2.13 -Dtest=none
> -DwildcardSuites=org.apache.spark.sql.DatasetLargeResultCollectingSuite
> ```
> - SBT:
> ```
> build/sbt clean "sql/testOnly
> org.apache.spark.sql.DatasetLargeResultCollectingSuite"
> ```
> ```
> dev/change-scala-version.sh 2.13
> build/sbt clean "sql/testOnly
> org.apache.spark.sql.DatasetLargeResultCollectingSuite" -Pscala-2.13
> ```
> All test failed with `java.lang.OutOfMemoryError: Java heap space` as follows:
> ```
> 10:19:56.910 ERROR org.apache.spark.executor.Executor: Exception in task 0.0
> in stage 0.0 (TID 0)
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
> at
> org.apache.spark.serializer.SerializerHelper$.$anonfun$serializeToChunkedBuffer$1(SerializerHelper.scala:40)
> at
> org.apache.spark.serializer.SerializerHelper$.$anonfun$serializeToChunkedBuffer$1$adapted(SerializerHelper.scala:40)
> at
> org.apache.spark.serializer.SerializerHelper$$$Lambda$2321/1995130077.apply(Unknown
> Source)
> at
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
> at
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
> at
> java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1853)
> at java.io.ObjectOutputStream.write(ObjectOutputStream.java:709)
> at
> org.apache.spark.util.Utils$.$anonfun$writeByteBuffer$1(Utils.scala:271)
> at
> org.apache.spark.util.Utils$.$anonfun$writeByteBuffer$1$adapted(Utils.scala:271)
> at org.apache.spark.util.Utils$$$Lambda$2324/69671223.apply(Unknown
> Source)
> at org.apache.spark.util.Utils$.writeByteBufferImpl(Utils.scala:249)
> at org.apache.spark.util.Utils$.writeByteBuffer(Utils.scala:271)
> at
> org.apache.spark.util.io.ChunkedByteBuffer.$anonfun$writeExternal$2(ChunkedByteBuffer.scala:103)
> at
> org.apache.spark.util.io.ChunkedByteBuffer.$anonfun$writeExternal$2$adapted(ChunkedByteBuffer.scala:103)
> at
> org.apache.spark.util.io.ChunkedByteBuffer$$Lambda$2323/1073743200.apply(Unknown
> Source)
> at scala.collection.ArrayOps$.foreach$extension(ArrayOps.scala:1328)
> at
> org.apache.spark.util.io.ChunkedByteBuffer.writeExternal(ChunkedByteBuffer.scala:103)
> at
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
> at
> org.apache.spark.serializer.SerializerHelper$.serializeToChunkedBuffer(SerializerHelper.scala:42)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:599)
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]