[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-20214: ------------------------------------ Assignee: (was: Apache Spark) > pyspark.mllib SciPyTests test_serialize > --------------------------------------- > > Key: SPARK-20214 > URL: https://issues.apache.org/jira/browse/SPARK-20214 > Project: Spark > Issue Type: Bug > Components: ML, MLlib, PySpark, Tests > Affects Versions: 2.0.2, 2.1.1, 2.2.0 > Reporter: Joseph K. Bradley > > I've seen a few failures of this line: > https://github.com/apache/spark/blame/402bf2a50ddd4039ff9f376b641bd18fffa54171/python/pyspark/mllib/tests.py#L847 > It converts a scipy.sparse.lil_matrix to a dok_matrix and then to a > pyspark.mllib.linalg.Vector. The failure happens in the conversion to a > vector and indicates that the dok_matrix is not returning its values in > sorted order. (Actually, the failure is in _convert_to_vector, which converts > the dok_matrix to a csc_matrix and then passes the CSC data to the MLlib > Vector constructor.) Here's the stack trace: > {code} > Traceback (most recent call last): > File "/home/jenkins/workspace/python/pyspark/mllib/tests.py", line 847, in > test_serialize > self.assertEqual(sv, _convert_to_vector(lil.todok())) > File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", > line 78, in _convert_to_vector > return SparseVector(l.shape[0], csc.indices, csc.data) > File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", > line 556, in __init__ > % (self.indices[i], self.indices[i + 1])) > TypeError: Indices 3 and 1 are not strictly increasing > {code} > This seems like a bug in _convert_to_vector, where we really should check > {{csc_matrix.has_sorted_indices}} first. > I haven't seen this bug in pyspark.ml.linalg, but it probably exists there > too. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org