Andrew Davidson created SPARK-12606:
---------------------------------------
Summary: Scala/Java compatibility issue Re: how to extend java
transformer from Scala UnaryTransformer ?
Key: SPARK-12606
URL: https://issues.apache.org/jira/browse/SPARK-12606
Project: Spark
Issue Type: Bug
Components: ML
Affects Versions: 1.5.2
Environment: Java 8, Mac OS, Spark-1.5.2
Reporter: Andrew Davidson
Hi Andy,
I suspect that you hit the Scala/Java compatibility issue, I can also reproduce
this issue, so could you file a JIRA to track this issue?
Yanbo
2016-01-02 3:38 GMT+08:00 Andy Davidson <[email protected]>:
I am trying to write a trivial transformer I use use in my pipeline. I am using
java and spark 1.5.2. It was suggested that I use the Tokenize.scala class as
an example. This should be very easy how ever I do not understand Scala, I am
having trouble debugging the following exception.
Any help would be greatly appreciated.
Happy New Year
Andy
java.lang.IllegalArgumentException: requirement failed: Param null__inputCol
does not belong to Stemmer_2f3aa96d-7919-4eaa-ad54-f7c620b92d1c.
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.ml.param.Params$class.shouldOwn(params.scala:557)
at org.apache.spark.ml.param.Params$class.set(params.scala:436)
at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
at org.apache.spark.ml.param.Params$class.set(params.scala:422)
at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:37)
at
org.apache.spark.ml.UnaryTransformer.setInputCol(Transformer.scala:83)
at com.pws.xxx.ml.StemmerTest.test(StemmerTest.java:30)
public class StemmerTest extends AbstractSparkTest {
@Test
public void test() {
Stemmer stemmer = new Stemmer()
.setInputCol("raw”) //line 30
.setOutputCol("filtered");
}
}
/**
* @ see
spark-1.5.1/mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
* @ see
https://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/
* @ see
http://www.tonytruong.net/movie-rating-prediction-with-apache-spark-and-hortonworks/
*
* @author andrewdavidson
*
*/
public class Stemmer extends UnaryTransformer<List<String>, List<String>,
Stemmer> implements Serializable{
static Logger logger = LoggerFactory.getLogger(Stemmer.class);
private static final long serialVersionUID = 1L;
private static final ArrayType inputType =
DataTypes.createArrayType(DataTypes.StringType, true);
private final String uid = Stemmer.class.getSimpleName() + "_" +
UUID.randomUUID().toString();
@Override
public String uid() {
return uid;
}
/*
override protected def validateInputType(inputType: DataType): Unit = {
require(inputType == StringType, s"Input type must be string type but got
$inputType.")
}
*/
@Override
public void validateInputType(DataType inputTypeArg) {
String msg = "inputType must be " + inputType.simpleString() + " but
got " + inputTypeArg.simpleString();
assert (inputType.equals(inputTypeArg)) : msg;
}
@Override
public Function1<List<String>, List<String>> createTransformFunc() {
//
http://stackoverflow.com/questions/6545066/using-scala-from-java-passing-functions-as-parameters
Function1<List<String>, List<String>> f = new
AbstractFunction1<List<String>, List<String>>() {
public List<String> apply(List<String> words) {
for(String word : words) {
logger.error("AEDWIP input word: {}", word);
}
return words;
}
};
return f;
}
@Override
public DataType outputDataType() {
return DataTypes.createArrayType(DataTypes.StringType, true);
}
}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]