doyexie created SPARK-7362:
------------------------------

             Summary: Spark MLlib libsvm isssues with data
                 Key: SPARK-7362
                 URL: https://issues.apache.org/jira/browse/SPARK-7362
             Project: Spark
          Issue Type: Question
          Components: MLlib
    Affects Versions: 1.3.1
         Environment: Linux version 3.13.0-45-generic (buildd@phianna) (gcc 
version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) )
            Reporter: doyexie


Hi I'm trying with the demo in 
http://spark.apache.org/docs/1.2.1/mllib-linear-methods.html with the example 
with scala version. I run the demo it was worked fine but when I change data 
with mine the step of train it just error with

15/05/05 16:32:02 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 21, 
localhost, PROCESS_LOCAL, 1447 bytes)
15/05/05 16:32:02 INFO TaskSetManager: Starting task 1.0 in stage 12.0 (TID 22, 
localhost, PROCESS_LOCAL, 1447 bytes)
15/05/05 16:32:02 INFO Executor: Running task 0.0 in stage 12.0 (TID 21)
15/05/05 16:32:02 INFO Executor: Running task 1.0 in stage 12.0 (TID 22)
15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_1 locally
15/05/05 16:32:02 ERROR Executor: Exception in task 1.0 in stage 12.0 (TID 22)
java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
    at 
org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_0 locally
15/05/05 16:32:02 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 21)
java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
    at 
org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
15/05/05 16:32:02 WARN TaskSetManager: Lost task 1.0 in stage 12.0 (TID 22, 
localhost): java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
    at 
org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

15/05/05 16:32:02 ERROR TaskSetManager: Task 1 in stage 12.0 failed 1 times; 
aborting job
15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks 
have all completed, from pool 
15/05/05 16:32:02 INFO TaskSetManager: Lost task 0.0 in stage 12.0 (TID 21) on 
executor localhost: java.lang.ArrayIndexOutOfBoundsException (-1) [duplicate 1]
15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks 
have all completed, from pool 
15/05/05 16:32:02 INFO TaskSchedulerImpl: Cancelling stage 12
15/05/05 16:32:02 INFO DAGScheduler: Job 12 failed: treeAggregate at 
GradientDescent.scala:189, took 0.032101 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in 
stage 12.0 failed 1 times, most recent failure: Lost task 1.0 in stage 12.0 
(TID 22, localhost): java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
    at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
    at 
org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
    at 
org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at 
scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
    at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
    at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
    at scala.Option.foreach(Option.scala:236)
    at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
https://github.com/hermitD/temp here's my test data file I've use it to train 
with libsvm-tools under linux and it works! and exam format with libsvm python 
tool it shows ok. just don't know why it error.so please tell me how to fix 
this.or tell me how to address the problem so to fix it :(



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to