[ 
https://issues.apache.org/jira/browse/SPARK-22038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuhaibo updated SPARK-22038:
----------------------------
    Description: 
I try to train a big model.
I have 40 millions instances and 50 millions feature set, and it is sparse.
I am using 40 executors with 20 GB each + driver with 40 GB. The number of data 
partitions is 5000, the treeAggregate depth is 4, the 
spark.kryoserializer.buffer.max is 2016m, the spark.driver.maxResultSize is 40G.

The execution fails with the following messages:

+WARN TaskSetManager: Lost task 2.1 in stage 25.0 (TID 1415, Blackstone064183, 
executor 15): org.apache.spark.SparkException: Kryo serialization failed: 
Buffer overflow. Available: 3, required: 8
Serialization trace:
currMin (org.apache.spark.mllib.stat.MultivariateOnlineSummarizer). To avoid 
this, increase spark.kryoserializer.buffer.max value.
        at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

+

I know that spark.kryoserializer.buffer.max limit 2g and can not increase.
I have already try increasing partition num to 10000 and treeAggregate depth to 
200, it still failed with same error message.
And I try use java serializer without kryoserializer, it failed with oom:

WARN TaskSetManager: Lost task 5.0 in stage 32.0 (TID 15701, Blackstone065188, 
executor 4): +java.lang.OutOfMemoryError
        at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
        at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
        at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
        at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)+


Any advice?

  was:
I try to train a big model.
I have 40 millions instances and 50 millions feature set, and it is sparse.
I am using 40 executors with 20 GB each + driver with 40 GB. The number of data 
partitions is 5000, the treeAggregate depth is 4, the 
spark.kryoserializer.buffer.max is 2016m, the spark.driver.maxResultSize is 40G.

The execution fails with the following messages:
+WARN TaskSetManager: Lost task 2.1 in stage 25.0 (TID 1415, Blackstone064183, 
executor 15): org.apache.spark.SparkException: Kryo serialization failed: 
Buffer overflow. Available: 3, required: 8
Serialization trace:
currMin (org.apache.spark.mllib.stat.MultivariateOnlineSummarizer). To avoid 
this, increase spark.kryoserializer.buffer.max value.
        at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

+

I know that spark.kryoserializer.buffer.max limit 2g and can not increase.
I have already try increasing partition num to 10000 and treeAggregate depth to 
200, it still failed with same error message.
And I try use java serializer without kryoserializer, it failed with oom:
WARN TaskSetManager: Lost task 5.0 in stage 32.0 (TID 15701, Blackstone065188, 
executor 4): +java.lang.OutOfMemoryError
        at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
        at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
        at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
        at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
        at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
        at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
        at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)+


Any advice?


> spark 2.1.1 ml.LogisticRegression with large feature set cause Kryo 
> serialization failed: Buffer overflow
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22038
>                 URL: https://issues.apache.org/jira/browse/SPARK-22038
>             Project: Spark
>          Issue Type: Question
>          Components: ML
>    Affects Versions: 2.1.1
>            Reporter: wuhaibo
>
> I try to train a big model.
> I have 40 millions instances and 50 millions feature set, and it is sparse.
> I am using 40 executors with 20 GB each + driver with 40 GB. The number of 
> data partitions is 5000, the treeAggregate depth is 4, the 
> spark.kryoserializer.buffer.max is 2016m, the spark.driver.maxResultSize is 
> 40G.
> The execution fails with the following messages:
> +WARN TaskSetManager: Lost task 2.1 in stage 25.0 (TID 1415, 
> Blackstone064183, executor 15): org.apache.spark.SparkException: Kryo 
> serialization failed: Buffer overflow. Available: 3, required: 8
> Serialization trace:
> currMin (org.apache.spark.mllib.stat.MultivariateOnlineSummarizer). To avoid 
> this, increase spark.kryoserializer.buffer.max value.
>         at 
> org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)
> +
> I know that spark.kryoserializer.buffer.max limit 2g and can not increase.
> I have already try increasing partition num to 10000 and treeAggregate depth 
> to 200, it still failed with same error message.
> And I try use java serializer without kryoserializer, it failed with oom:
> WARN TaskSetManager: Lost task 5.0 in stage 32.0 (TID 15701, 
> Blackstone065188, executor 4): +java.lang.OutOfMemoryError
>         at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>         at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>         at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>         at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>         at 
> org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
>         at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
>         at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
>         at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
>         at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>         at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
>         at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:364)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:748)+
> Any advice?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to