Vinh Tran created ZEPPELIN-4971:
-----------------------------------
Summary: XGBOOST4j Spark Fails String Indexer
Key: ZEPPELIN-4971
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4971
Project: Zeppelin
Issue Type: Bug
Components: conf, Interpreters, spark, zeppelin-server
Reporter: Vinh Tran
I'm trying to follow the tutorial for running XGBOOST[
XGBOOST-SPARK|https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html]
on a Spark 3.0.0 cluster in Apache Zeppelin 0.8.2.
However, when I load the dependencies:
{code:java}
export SPARK_SUBMIT_OPTIONS="--package ml.dmlc:xgboo4j-spark_2.12:1.00"
{code}
I get the following error when I run the following StringIndexer.
{code:java}
val stringIndexer = new StringIndexer().
setInputCol("class").
setOutputCol("classIndex").
fit(rawInput)
{code}
{code:java}
java.lang.NoSuchMethodError:
com.esotericsoftware.kryo.Kryo.setInstantiatorStrategy(Lorg/objenesis/strategy/InstantiatorStrategy;)V
at com.twitter.chill.KryoBase.setInstantiatorStrategy(KryoBase.scala:99) at
com.twitter.chill.EmptyScalaKryoInstantiator.newKryo(ScalaKryoInstantiator.scala:62)
at
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:131) at
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
at
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
at
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:336)
at
org.apache.spark.serializer.KryoSerializerInstance.deserialize(KryoSerializer.scala:389)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown
Source) at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:184)
at
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:175)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at
scala.collection.TraversableLike.map(TraversableLike.scala:237) at
scala.collection.TraversableLike.map$(TraversableLike.scala:230) at
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at
org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3625) at
org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2938) at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616) at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763) at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614) at
org.apache.spark.sql.Dataset.collect(Dataset.scala:2938) at
org.apache.spark.ml.feature.StringIndexer.countByValue(StringIndexer.scala:204)
at
org.apache.spark.ml.feature.StringIndexer.sortByFreq(StringIndexer.scala:212)
at org.apache.spark.ml.feature.StringIndexer.fit(StringIndexer.scala:241) ...
46 elided
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)