[
https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015345#comment-16015345
]
Yan Facai (颜发才) commented on SPARK-19581:
-----------------------------------------
[~barrybecker4] Hi, Becker.
I can't reproduce the bug on spark-2.1.1-bin-hadoop2.7.
1) For 0 size of feature, the exception is harmless.
```scala
val data =
spark.read.format("libsvm").load("/user/facai/data/libsvm/sample_libsvm_data.txt").cache
import org.apache.spark.ml.classification.NaiveBayes
val model = new NaiveBayes().fit(data)
import org.apache.spark.ml.linalg.{Vectors => SV}
case class TestData(features: org.apache.spark.ml.linalg.Vector)
val emptyVector = SV.sparse(0, Array.empty[Int], Array.empty[Double])
val test = Seq(TestData(emptyVector)).toDF
scala> test.show
+---------+
| features|
+---------+
|(0,[],[])|
+---------+
scala> model.transform(test).show
org.apache.spark.SparkException: Failed to execute user defined
function($anonfun$1: (vector) => vector)
at
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1072)
... 48 elided
Caused by: java.lang.IllegalArgumentException: requirement failed: The columns
of A don't match the number of elements of x. A: 692, x: 0
at scala.Predef$.require(Predef.scala:224)
... 99 more
```
2) For 692 size of empty feature, it's OK.
```scala
scala> val emptyVector = SV.sparse(692, Array.empty[Int], Array.empty[Double])
emptyVector: org.apache.spark.ml.linalg.Vector = (692,[],[])
scala> val test = Seq(TestData(emptyVector)).toDF
test: org.apache.spark.sql.DataFrame = [features: vector]
scala> test.show
+-----------+
| features|
+-----------+
|(692,[],[])|
+-----------+
scala> model.transform(test).show
+-----------+--------------------+--------------------+----------+
| features| rawPrediction| probability|prediction|
+-----------+--------------------+--------------------+----------+
|(692,[],[])|[-0.8407831793660...|[0.43137254901960...| 1.0|
+-----------+--------------------+--------------------+----------+
```
> running NaiveBayes model with 0 features can crash the executor with D
> rorreGEMV
> --------------------------------------------------------------------------------
>
> Key: SPARK-19581
> URL: https://issues.apache.org/jira/browse/SPARK-19581
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.1.0
> Environment: spark development or standalone mode on windows or linux.
> Reporter: Barry Becker
> Priority: Minor
>
> The severity of this bug is high (because nothing should cause spark to crash
> like this) but the priority may be low (because there is an easy workaround).
> In our application, a user can select features and a target to run the
> NaiveBayes inducer. If columns have too many values or all one value, they
> will be removed before we call the inducer to create the model. As a result,
> there are some cases, where all the features may get removed. When this
> happens, executors will crash and get restarted (if on a cluster) or spark
> will crash and need to be manually restarted (if in development mode).
> It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well
> when it is encountered. I emits this vague error :
> ** On entry to DGEMV parameter number 6 had an illegal value
> and terminates.
> My code looks like this:
> {code}
> val predictions = model.transform(testData) // Make predictions
> // figure out how many were correctly predicted
> val numCorrect = predictions.filter(new Column(actualTarget) === new
> Column(PREDICTION_LABEL_COLUMN)).count()
> val numIncorrect = testRowCount - numCorrect
> {code}
> The failure is at the line that does the count, but it is not the count that
> causes the problem, it is the model.transform step (where the model contains
> the NaiveBayes classifier).
> Here is the stack trace (in development mode):
> {code}
> [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ []
> [akka://JobServer/user/context-supervisor/sql-context] - done making
> predictions in 232
> ** On entry to DGEMV parameter number 6 had an illegal value
> ** On entry to DGEMV parameter number 6 had an illegal value
> ** On entry to DGEMV parameter number 6 had an illegal value
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
> [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus []
> [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has
> already stopped! Dropping event
> SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException:
> Job 12 cancelled because SparkContext was shut down))
> [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor []
> [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
> org.apache.spark.SparkException: Job 12 cancelled because SparkContext was
> shut down
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
> at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
> at
> org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
> at
> org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
> at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
> at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
> at
> org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
> at
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> at
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> {code}
> and here it is when running in standalone mode:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
> stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor
> 8 exited caused by one of the running tasks) Reason: Remote RPC client
> disassociated. Likely due to containers exceeding thresholds, or network
> issues. Check driver logs for WARN messages. Driver
> stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
> scala.Option.foreach(Option.scala:257)
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
> org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> org.apache.spark.rdd.RDD.collect(RDD.scala:934)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371)
>
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765)
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370)
>
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377)
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405)
> org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404)
> org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778)
> org.apache.spark.sql.Dataset.count(Dataset.scala:2404)
> com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338)
>
> com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97)
>
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129)
>
> com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83)
>
> com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39)
>
> com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39)
> spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292)
>
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]