Hi,
I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...
Basically I have to change the toString of LabeledPoint and toString of
SparseVector....
Should I add it as a PR or is it already being added ?
I added these functions toLibSvm in my internal util class for now...
def toLibSvm(labelPoint: LabeledPoint): String = {
labelPoint.label.toString + " " +
toLibSvm(labelPoint.features.asInstanceOf[SparseVector])
}
def toLibSvm(features: SparseVector): String = {
val indices = features.indices
val values = features.values
indices.zip(values).mkString("
").replace(',', ':').replace("(", "").replace(")","")
}
Thanks.
Deb