[
https://issues.apache.org/jira/browse/SPARK-15509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304777#comment-15304777
]
Xin Ren commented on SPARK-15509:
---------------------------------
Hi [~josephkb], I tried many times but cannot reproduce your error message here.
I tried R naiveBayes package and also spark.naiveBayes, but both got
{code}
naiveBayes formula interface handles data frames or arrays only
{code}
below is what I did:
{code}
./bin/sparkR --master "local[2]"
> training <- loadDF(sqlContext, "data/mllib/sample_libsvm_data.txt", "libsvm")
> model <- spark.naiveBayes(label ~ features, training)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘spark.naiveBayes’ for
signature ‘"formula", "SparkDataFrame"’
> model <- naiveBayes(label ~ features, training)
Error in naiveBayes.formula(label ~ features, training) :
naiveBayes formula interface handles data frames or arrays only
{code}
then I tried example here and it's working
http://spark.apache.org/docs/latest/sparkr.html#gaussian-glm-model
{code}
df <- createDataFrame(sqlContext, iris)
model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family =
"gaussian")
{code}
so I compare these 2 examples, and features are 'vector' type and df above is
normal columns.
{code}
> df
SparkDataFrame[Sepal_Length:double, Sepal_Width:double, Petal_Length:double,
Petal_Width:double, Species:string]
> training
SparkDataFrame[label:double, features:vector]
{code}
I also downloaded "mnist" dataset LibSVM, and same error.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#mnist
Is there anything I'm doing wrong? I'm using R package of naiveBayes
(http://www.inside-r.org/packages/cran/e1071/docs/naivebayes), maybe I'm using
the wrong package?
Thank you very much Joseph.
> R MLlib algorithms should support input columns "features" and "label"
> ----------------------------------------------------------------------
>
> Key: SPARK-15509
> URL: https://issues.apache.org/jira/browse/SPARK-15509
> Project: Spark
> Issue Type: Improvement
> Components: ML, SparkR
> Reporter: Joseph K. Bradley
>
> Currently in SparkR, when you load a LibSVM dataset using the sqlContext and
> then pass it to an MLlib algorithm, the ML wrappers will fail since they will
> try to create a "features" column, which conflicts with the existing
> "features" column from the LibSVM loader. E.g., using the "mnist" dataset
> from LibSVM:
> {code}
> training <- loadDF(sqlContext, ".../mnist", "libsvm")
> model <- naiveBayes(label ~ features, training)
> {code}
> This fails with:
> {code}
> 16/05/24 11:52:41 ERROR RBackendHandler: fit on
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
> java.lang.IllegalArgumentException: Output column features already exists.
> at
> org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:120)
> at
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
> at
> org.apache.spark.ml.Pipeline$$anonfun$transformSchema$4.apply(Pipeline.scala:179)
> at
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
> at
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
> at scala.collection.mutable.ArrayOps$ofRef.foldLeft(ArrayOps.scala:186)
> at org.apache.spark.ml.Pipeline.transformSchema(Pipeline.scala:179)
> at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
> at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:131)
> at org.apache.spark.ml.feature.RFormula.fit(RFormula.scala:169)
> at
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:62)
> at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.sca
> {code}
> The same issue appears for the "label" column once you rename the "features"
> column.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]