Renan Paes Leme created GRIFFIN-274:
---------------------------------------
Summary: Error: java.lang.NoClassDefFoundError:
scala/collection/GenTraversableOnce$class
Key: GRIFFIN-274
URL: https://issues.apache.org/jira/browse/GRIFFIN-274
Project: Griffin
Issue Type: Bug
Components: completeness-batch
Affects Versions: 0.5.0, 0.6.0
Reporter: Renan Paes Leme
I'm trying to run one completeness job to test Griffin features but when I try
to run it, reading from one Avro file I got this error:
{code:java}
19/08/07 15:38:47 ERROR executor.Executor: Exception in task 2.2 in stage 2.0
(TID 13)
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at
com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1$$anon$1.<init>(DefaultSource.scala:205)
at
com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:205)
at
com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:160)
at
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:136)
at
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:120)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at
org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:133)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748){code}
*I already checked and changed the dependencies versions
(Spark/Scala/Databricks) but still couldn't fix that*
*Pom depedencies:*
{code:java}
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.griffin</groupId>
<artifactId>griffin</artifactId>
<version>0.6.0-SNAPSHOT</version>
</parent>
<artifactId>measure</artifactId>
<packaging>jar</packaging>
<name>Apache Griffin :: Measures</name>
<url>http://maven.apache.org</url>
<properties>
<scala.version>2.11.8</scala.version>
<spark.version>2.2.1</spark.version>
<scala.binary.version>2.11</scala.binary.version>
<avro.version>1.7.7</avro.version>
<jackson.version>2.8.7</jackson.version>
<scalaj.version>2.3.0</scalaj.version>
<mongo.version>2.1.0</mongo.version>
<scalatest.version>3.0.0</scalatest.version>
<slf4j.version>1.7.21</slf4j.version>
<log4j.version>1.2.16</log4j.version>
<curator.version>2.10.0</curator.version>
<scalamock.version>3.6.0</scalamock.version>
<spark.testing.version>0.9.0</spark.testing.version>
</properties>
<dependencies>
<!--scala-->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!--spark, spark streaming, spark hive-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!--scalaj for http request-->
<dependency>
<groupId>org.scalaj</groupId>
<artifactId>scalaj-http_${scala.binary.version}</artifactId>
<version>${scalaj.version}</version>
</dependency>
<!--avro-->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-avro_${scala.binary.version}</artifactId>
<version>4.0.0</version>
</dependency>
</dependencies>
{code}
*Measure creation request:*
{code:java}
{
"data.sources": [
{
"name": "source",
"connectors": [
{
"name": "avro_source",
"type": "AVRO",
"version": "1.7",
"config": {
"file.name": "/datasets/stackoverflow/000000000000.avro"
}
}
]
}
],
"evaluate.rule": {
"rules": [
{
"dsl.type": "griffin-dsl",
"dq.type": "COMPLETENESS",
"out.dataframe.name": "comp",
"rule": "favorite_count, community_owned_date",
"out": [
{
"type": "metric",
"name": "comp"
}
]
}
]
},
"measure.type": "griffin",
"name": "completeness2",
"owner": "test",
"process.type": "BATCH",
"timestamp": 1565174991000,
"sinks": ["CONSOLE","ELASTICSEARCH"]
}{code}
*Job creation request:*
{code:java}
{
"measure.id": 1002,
"job.name":"COMPLETENESS",
"job.type": "batch",
"cron.expression": "0 0/3 0 ? * * *",
"cron.time.zone": "GMT+1:00",
"predicate.config": {
"checkdonefile.schedule":{
"interval": "1m",
"repeat": 2
}
},
"data.segments": [
{
"as.baseline": true,
"data.connector.name": "avro_source",
"segment.range": {
"begin": "0day",
"length": "0day"
}
}
],
"sinks": ["CONSOLE", "ELASTICSEARCH"]
}{code}
Anyone knows how I can fix this?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)