Hi,

I think this should raise an error both in the scala code and python API.

Please open a JIRA.

On Thu, Jul 23, 2015 at 4:22 PM, Andrew Vykhodtsev <yoz...@gmail.com> wrote:

> Dear Developers,
>
> I found that one can create SparseVector inconsistently and it will lead
> to an Java error in runtime, for example when training
> LogisticRegressionWithSGD.
>
> Here is the test case:
>
>
> In [2]:
> sc.version
> Out[2]:
> u'1.3.1'
> In [13]:
> from pyspark.mllib.linalg import SparseVector
> from pyspark.mllib.regression import LabeledPoint
> from pyspark.mllib.classification import LogisticRegressionWithSGD
> In [3]:
> x =  SparseVector(2, {1:1, 2:2, 3:3, 4:4, 5:5})
> In [10]:
> l = LabeledPoint(0, x)
> In [12]:
> r = sc.parallelize([l])
> In [14]:
> m = LogisticRegressionWithSGD.train(r)
>
> Error:
>
>
> Py4JJavaError: An error occurred while calling 
> o86.trainLogisticRegressionModelWithSGD.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 
> in stage 11.0 failed 1 times, most recent failure: Lost task 7.0 in stage 
> 11.0 (TID 47, localhost): *java.lang.ArrayIndexOutOfBoundsException: 2*
>
>
>
> Attached is the notebook with the scenario and the full message:
>
>
>
> Should I raise a JIRA for this (forgive me if there is such a JIRA and I did 
> not notice it)
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>



-- 
Godspeed,
Manoj Kumar,
http://manojbits.wordpress.com
<http://goog_1017110195>
http://github.com/MechCoder

Reply via email to