[
https://issues.apache.org/jira/browse/SPARK-17801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
samkit updated SPARK-17801:
---------------------------
Description:
Random Forest Regression
Data:https://www.kaggle.com/c/grupo-bimbo-inventory-demand/download/train.csv.zip
Parameters:
NumTrees:500 Maximum Bins:7477383 MaxDepth:27
MinInstancesPerNode:8648 SamplingRate:1.0
Java Options:
"-Xms16384M" "-Xmx16384M" "-Dspark.locality.wait=0s"
"-Dspark.driver.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:-UseAdaptiveSizePolicy
-XX:ConcGCThreads=2 -XX:-UseGCOverheadLimit
-XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=8g -XX:MaxNewSize=8g
-XX:SurvivorRatio=3 -DnumPartitions=36" "-Dspark.submit.deployMode=cluster"
"-Dspark.speculation=true" " "-Dspark.speculation.multiplier=2"
"-Dspark.driver.memory=16g" "-Dspark.speculation.interval=300ms"
"-Dspark.speculation.quantile=0.5" "-Dspark.akka.frameSize=768"
"-Dspark.driver.supervise=false" "-Dspark.executor.cores=6"
"-Dspark.executor.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:+UseParallelOldGC
-XX:ParallelGCThreads=6 -XX:NewSize=22g -XX:MaxNewSize=22g -XX:SurvivorRatio=2
-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps"
"-Dspark.rpc.askTimeout=10" "-Dspark.executor.memory=40g"
"-Dspark.driver.maxResultSize=3g" "-Xss10240k" "-XX:+PrintGCDetails"
"-XX:+PrintGCTimeStamps" "-XX:+PrintTenuringDistribution"
"-XX:+UseConcMarkSweepGC" "-XX:+UseParNewGC" "-XX:ParallelGCThreads=2"
"-XX:-UseAdaptiveSizePolicy" "-XX:ConcGCThreads=2" "-XX:-UseGCOverheadLimit"
"-XX:CMSInitiatingOccupancyFraction=75" "-XX:NewSize=8g" "-XX:MaxNewSize=8g"
"-XX:SurvivorRatio=3" "-DnumPartitions=36"
Executor and Driver ErrorLog
https://gist.github.com/anonymous/603ac7f8f17e43c51ba93b2934cd4cb6
was:
Random Forest Regression
Data:https://www.kaggle.com/c/grupo-bimbo-inventory-demand/download/train.csv.zip
Parameters:
NumTrees:500
Maximum Bins:7477383
MaxDepth:27
MinInstancesPerNode:8648
SamplingRate:1.0
Executor and Driver ErrorLog
https://gist.github.com/anonymous/603ac7f8f17e43c51ba93b2934cd4cb6
> [ML]Random Forest Regression fails for large input
> --------------------------------------------------
>
> Key: SPARK-17801
> URL: https://issues.apache.org/jira/browse/SPARK-17801
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 1.6.1
> Environment: Ubuntu 14.04
> Reporter: samkit
> Priority: Minor
>
> Random Forest Regression
> Data:https://www.kaggle.com/c/grupo-bimbo-inventory-demand/download/train.csv.zip
> Parameters:
> NumTrees:500 Maximum Bins:7477383 MaxDepth:27
> MinInstancesPerNode:8648 SamplingRate:1.0
> Java Options:
> "-Xms16384M" "-Xmx16384M" "-Dspark.locality.wait=0s"
> "-Dspark.driver.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:ParallelGCThreads=2 -XX:-UseAdaptiveSizePolicy
> -XX:ConcGCThreads=2 -XX:-UseGCOverheadLimit
> -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=8g -XX:MaxNewSize=8g
> -XX:SurvivorRatio=3 -DnumPartitions=36" "-Dspark.submit.deployMode=cluster"
> "-Dspark.speculation=true" " "-Dspark.speculation.multiplier=2"
> "-Dspark.driver.memory=16g" "-Dspark.speculation.interval=300ms"
> "-Dspark.speculation.quantile=0.5" "-Dspark.akka.frameSize=768"
> "-Dspark.driver.supervise=false" "-Dspark.executor.cores=6"
> "-Dspark.executor.extraJavaOptions=-Xss10240k -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
> -XX:-UseAdaptiveSizePolicy -XX:+UseParallelGC -XX:+UseParallelOldGC
> -XX:ParallelGCThreads=6 -XX:NewSize=22g -XX:MaxNewSize=22g
> -XX:SurvivorRatio=2 -XX:+PrintAdaptiveSizePolicy -XX:+PrintGCDateStamps"
> "-Dspark.rpc.askTimeout=10" "-Dspark.executor.memory=40g"
> "-Dspark.driver.maxResultSize=3g" "-Xss10240k" "-XX:+PrintGCDetails"
> "-XX:+PrintGCTimeStamps" "-XX:+PrintTenuringDistribution"
> "-XX:+UseConcMarkSweepGC" "-XX:+UseParNewGC" "-XX:ParallelGCThreads=2"
> "-XX:-UseAdaptiveSizePolicy" "-XX:ConcGCThreads=2" "-XX:-UseGCOverheadLimit"
> "-XX:CMSInitiatingOccupancyFraction=75" "-XX:NewSize=8g" "-XX:MaxNewSize=8g"
> "-XX:SurvivorRatio=3" "-DnumPartitions=36"
> Executor and Driver ErrorLog
> https://gist.github.com/anonymous/603ac7f8f17e43c51ba93b2934cd4cb6
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]