cqfrog created SPARK-35423:
------------------------------
Summary: The output of PCA is inconsistent
Key: SPARK-35423
URL: https://issues.apache.org/jira/browse/SPARK-35423
Project: Spark
Issue Type: Bug
Components: MLlib
Affects Versions: 3.1.1
Environment: Spark Version: 3.1.1
Reporter: cqfrog
1. The example from doc
{code:java}
import org.apache.spark.ml.feature.PCA
import org.apache.spark.ml.linalg.Vectors
val data = Array(
Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
)
val df = spark.createDataFrame(data.map(Tuple1.apply)).toDF("features")
val pca = new PCA()
.setInputCol("features")
.setOutputCol("pcaFeatures")
.setK(3)
.fit(df)
val result = pca.transform(df).select("pcaFeatures")
result.show(false)
{code}
the output show:
{code:java}
+-----------------------------------------------------------+
|pcaFeatures |
+-----------------------------------------------------------+
|[1.6485728230883807,-4.013282700516296,-5.524543751369388] |
|[-4.645104331781534,-1.1167972663619026,-5.524543751369387]|
|[-6.428880535676489,-5.337951427775355,-5.524543751369389] |
+-----------------------------------------------------------+
{code}
2. change the Vector format
I modified the code from "Vectors.sparse(5, Seq((1, 1.0), (3, 7.0)))" to
"Vectors.dense(0.0,1.0,0.0,7.0,0.0)" 。
but the output show:
{code:java}
+------------------------------------------------------------+
|pcaFeatures |
+------------------------------------------------------------+
|[1.6485728230883814,-4.0132827005162985,-1.0091435193998504]|
|[-4.645104331781533,-1.1167972663619048,-1.0091435193998501]|
|[-6.428880535676488,-5.337951427775359,-1.009143519399851] |
+------------------------------------------------------------+
{code}
It's strange that the two outputs are inconsistent. Why?
Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]