[
https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-19581:
------------------------------
Priority: Minor (was: Major)
It's basically invalid input, so should result in some kind of error, and
that's what you see. However the error could be better I'm sure. Feel free to
improve it in a PR.
> running NaiveBayes model with 0 features can crash the executor with D
> rorreGEMV
> --------------------------------------------------------------------------------
>
> Key: SPARK-19581
> URL: https://issues.apache.org/jira/browse/SPARK-19581
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.1.0
> Environment: spark development or standalone mode on windows or linux.
> Reporter: Barry Becker
> Priority: Minor
>
> The severity of this bug is high (because nothing should cause spark to crash
> like this) but the priority may be low (because there is an easy workaround).
> In our application, a user can select features and a target to run the
> NaiveBayes inducer. If columns have too many values or all one value, they
> will be removed before we call the inducer to create the model. As a result,
> there are some cases, where all the features may get removed. When this
> happens, executors will crash and get restarted (if on a cluster) or spark
> will crash and need to be manually restarted (if in development mode).
> It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well
> when it is encountered. I emits this vague error :
> ** On entry to DGEMV parameter number 6 had an illegal value
> and terminates.
> My code looks like this:
> {code}
> val predictions = model.transform(testData) // Make predictions
> // figure out how many were correctly predicted
> val numCorrect = predictions.filter(new Column(actualTarget) === new
> Column(PREDICTION_LABEL_COLUMN)).count()
> val numIncorrect = testRowCount - numCorrect
> {code}
> The failure is at the line that does the count, but it is not the count that
> causes the problem, it is the model.transform step (where the model contains
> the NaiveBayes classifier).
> Here is the stack trace (in development mode):
> {code}
> [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ []
> [akka://JobServer/user/context-supervisor/sql-context] - done making
> predictions in 232
> ** On entry to DGEMV parameter number 6 had an illegal value
> ** On entry to DGEMV parameter number 6 had an illegal value
> ** On entry to DGEMV parameter number 6 had an illegal value
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
> [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event
> SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException:
> Job 12 cancelled because SparkContext was shut down))
> [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor []
> [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
> org.apache.spark.SparkException: Job 12 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
> at
> org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
> at
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> {code}
> and here it is when running in standalone mode:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor
> 8 exited caused by one of the running tasks) Reason: Remote RPC client
> disassociated. Likely due to containers exceeding thresholds, or network
> issues. Check driver logs for WARN messages. Driver
> stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> scala.Option.foreach(Option.scala:257)
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> org.apache.spark.rdd.RDD.collect(RDD.scala:934)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
>
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
>
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405)
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404)
> org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778)
> org.apache.spark.sql.Dataset.count(Dataset.scala:2404)
> com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338)
>
> com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97)
>
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129)
>
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83)
>
> com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39)
>
> com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292)
>
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]