[jira] [Created] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

Barry Becker (JIRA) Mon, 13 Feb 2017 06:55:37 -0800

Barry Becker created SPARK-19581:
------------------------------------

             Summary: running NaiveBayes model with 0 features can crash the 
executor with D rorreGEMV
                 Key: SPARK-19581
                 URL: https://issues.apache.org/jira/browse/SPARK-19581
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.1.0
         Environment: spark development or standalone mode on windows or linux.
            Reporter: Barry Becker



The severity of this bug is high (because nothing should cause spark to crash 
like this) but the priority may be low (because there is an easy workaround).

In our application, a user can select features and a target to run the 
NaiveBayes inducer. If columns have too many values or all one value, they will 
be removed before we call the inducer to create the model. As a result, there 
are some cases, where all the features may get removed. When this happens, 
executors will crash and get restarted (if on a cluster) or spark will crash 
and need to be manually restarted (if in development mode).

It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well 
when it is encountered. I emits this vague error :
** On entry to DGEMV  parameter number  6 had an illegal value
and terminates.

My code looks like this:
{code}
   val predictions = model.transform(testData)  // Make predictions
    // figure out how many were correctly predicted
    val numCorrect = predictions.filter(new Column(actualTarget) === new 
Column(PREDICTION_LABEL_COLUMN)).count()
    val numIncorrect = testRowCount - numCorrect
{code}
The failure is at the line that does the count, but it is not the count that 
causes the problem, it is the model.transform step (where the model contains 
the NaiveBayes classifier).

Here is the stack trace (in development mode):
{code}
[2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ [] 
[akka://JobServer/user/context-supervisor/sql-context] -      done making 
predictions in 232
 ** On entry to DGEMV  parameter number  6 had an illegal value
 ** On entry to DGEMV  parameter number  6 had an illegal value
 ** On entry to DGEMV  parameter number  6 had an illegal value
[2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
[akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
[2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] 
[akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
already stopped! Dropping event 
SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
[2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus [] 
[akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has 
already stopped! Dropping event 
SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException: 
Job 12 cancelled because SparkContext was shut down))
[2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor [] 
[akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
org.apache.spark.SparkException: Job 12 cancelled because SparkContext was shut 
down
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
        at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
        at 
org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
        at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
        at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
        at 
org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
        at 
org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
        at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
{code}

and here it is when running in standalone mode:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7134.0 
(TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor 8 exited 
caused by one of the running tasks) Reason: Remote RPC client disassociated. 
Likely due to containers exceeding thresholds, or network issues. Check driver 
logs for WARN messages. Driver 
stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
 scala.Option.foreach(Option.scala:257) 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
 org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) 
org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
org.apache.spark.rdd.RDD.withScope(RDD.scala:362) 
org.apache.spark.rdd.RDD.collect(RDD.scala:934) 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
 org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
 org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) 
org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) 
org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) 
org.apache.spark.sql.Dataset.count(Dataset.scala:2404) 
com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338)
 
com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97)
 
com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129)
 
com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83) 
com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39)
 
com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79)
 com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53) 
com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31) 
com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) 
spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292)
 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

Reply via email to