Hi, I think this should raise an error both in the scala code and python API.
Please open a JIRA. On Thu, Jul 23, 2015 at 4:22 PM, Andrew Vykhodtsev <yoz...@gmail.com> wrote: > Dear Developers, > > I found that one can create SparseVector inconsistently and it will lead > to an Java error in runtime, for example when training > LogisticRegressionWithSGD. > > Here is the test case: > > > In [2]: > sc.version > Out[2]: > u'1.3.1' > In [13]: > from pyspark.mllib.linalg import SparseVector > from pyspark.mllib.regression import LabeledPoint > from pyspark.mllib.classification import LogisticRegressionWithSGD > In [3]: > x = SparseVector(2, {1:1, 2:2, 3:3, 4:4, 5:5}) > In [10]: > l = LabeledPoint(0, x) > In [12]: > r = sc.parallelize([l]) > In [14]: > m = LogisticRegressionWithSGD.train(r) > > Error: > > > Py4JJavaError: An error occurred while calling > o86.trainLogisticRegressionModelWithSGD. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 > in stage 11.0 failed 1 times, most recent failure: Lost task 7.0 in stage > 11.0 (TID 47, localhost): *java.lang.ArrayIndexOutOfBoundsException: 2* > > > > Attached is the notebook with the scenario and the full message: > > > > Should I raise a JIRA for this (forgive me if there is such a JIRA and I did > not notice it) > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > -- Godspeed, Manoj Kumar, http://manojbits.wordpress.com <http://goog_1017110195> http://github.com/MechCoder