[ 
https://issues.apache.org/jira/browse/SPARK-21723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126939#comment-16126939
 ] 

Nick Pentreath commented on SPARK-21723:
----------------------------------------

Yes, we should definitely be able to write LibSVM format regardless of whether 
the original data was read from that format, and whether we have ML metadata 
attached to the dataframe. We should be able to inspect the vectors to get the 
size in the absence of the metadata.



> Can't write LibSVM - key not found: numFeatures
> -----------------------------------------------
>
>                 Key: SPARK-21723
>                 URL: https://issues.apache.org/jira/browse/SPARK-21723
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output, ML
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Jan Vršovský
>
> Writing a dataset to LibSVM format raises an exception
> {{java.util.NoSuchElementException: key not found: numFeatures}}
> Happens only when the dataset was NOT read from a LibSVM format before 
> (because otherwise numFeatures is in its metadata). Steps to reproduce:
> {{import org.apache.spark.ml.linalg.Vectors
> val rawData = Seq((1.0, Vectors.sparse(3, Seq((0, 2.0), (1, 3.0)))),
>                   (4.0, Vectors.sparse(3, Seq((0, 5.0), (2, 6.0)))))
> val dfTemp = spark.sparkContext.parallelize(rawData).toDF("label", "features")
> dfTemp.coalesce(1).write.format("libsvm").save("...filename...")}}
> PR with a fix and unit test is ready.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to