[
https://issues.apache.org/jira/browse/SPARK-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-7362.
------------------------------
Resolution: Invalid
Hi [~doye] could you start with asking on the mailing list, and/or searching
JIRA? There's not enough info to reproduce this or understand what the problem
might be. See
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> Spark MLlib libsvm isssues with data
> ------------------------------------
>
> Key: SPARK-7362
> URL: https://issues.apache.org/jira/browse/SPARK-7362
> Project: Spark
> Issue Type: Question
> Components: MLlib
> Affects Versions: 1.3.1
> Environment: Linux version 3.13.0-45-generic (buildd@phianna) (gcc
> version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) )
> Reporter: doyexie
>
> Hi I'm trying with the demo in
> http://spark.apache.org/docs/1.2.1/mllib-linear-methods.html with the example
> with scala version. I run the demo it was worked fine but when I change data
> with mine the step of train it just error with
> 15/05/05 16:32:02 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID
> 21, localhost, PROCESS_LOCAL, 1447 bytes)
> 15/05/05 16:32:02 INFO TaskSetManager: Starting task 1.0 in stage 12.0 (TID
> 22, localhost, PROCESS_LOCAL, 1447 bytes)
> 15/05/05 16:32:02 INFO Executor: Running task 0.0 in stage 12.0 (TID 21)
> 15/05/05 16:32:02 INFO Executor: Running task 1.0 in stage 12.0 (TID 22)
> 15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_1 locally
> 15/05/05 16:32:02 ERROR Executor: Exception in task 1.0 in stage 12.0 (TID 22)
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
> at
> org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_0 locally
> 15/05/05 16:32:02 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 21)
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
> at
> org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 WARN TaskSetManager: Lost task 1.0 in stage 12.0 (TID 22,
> localhost): java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
> at
> org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 ERROR TaskSetManager: Task 1 in stage 12.0 failed 1 times;
> aborting job
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks
> have all completed, from pool
> 15/05/05 16:32:02 INFO TaskSetManager: Lost task 0.0 in stage 12.0 (TID 21)
> on executor localhost: java.lang.ArrayIndexOutOfBoundsException (-1)
> [duplicate 1]
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks
> have all completed, from pool
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Cancelling stage 12
> 15/05/05 16:32:02 INFO DAGScheduler: Job 12 failed: treeAggregate at
> GradientDescent.scala:189, took 0.032101 s
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in
> stage 12.0 failed 1 times, most recent failure: Lost task 1.0 in stage 12.0
> (TID 22, localhost): java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
> at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
> at
> org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
> at
> org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> https://github.com/hermitD/temp here's my test data file I've use it to train
> with libsvm-tools under linux and it works! and exam format with libsvm
> python tool it shows ok. just don't know why it error.so please tell me how
> to fix this.or tell me how to address the problem so to fix it :(
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]