firecast commented on issue #894: Getting java.lang.NoSuchMethodError while doing Hive sync URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-533140792 > Cant understand where Hive 2.3.2-amzn-2 comes from still from what you shared So the Job connects to a remote Hive 2.3.2-amzn-2 server to create tables there. @vinothchandar I now tried to pull in the Hudi dependency from the local maven repository which was build on the `release-0.5.0-incubating-rc2` tag with the command `mvn clean install -DskipTests -DskipITs` Here is the `sbt.build` file ```sbt scalaVersion := "2.11.12" val sparkVersion = "2.4.3" resolvers += Resolver.mavenLocal libraryDependencies ++= Seq( "org.scala-lang" % "scala-compiler" % scalaVersion.value % Provided, "org.apache.spark" %% "spark-core" % sparkVersion % Provided, "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, "org.apache.spark" %% "spark-hive" % sparkVersion % Provided, "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % Provided, "com.amazonaws" % "aws-java-sdk-s3" % "1.11.633", "org.apache.hadoop" % "hadoop-aws" % "2.8.5", "org.apache.hudi" % "hudi-spark" % "0.5.0-incubating-rc2" ) dependencyOverrides ++= Seq( "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7", "org.slf4j" % "slf4j-log4j12" % "1.7.28" % Test ) ``` If I look at the dependency graph now, it is missing some transitive dependencies. ![Screenshot 2019-09-19 at 7 19 30 PM](https://user-images.githubusercontent.com/2487532/65250006-7a581d00-db12-11e9-8770-de12f036ae35.png) The missing dependencies are: ```xml <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> </dependency> <!-- Parquet --> <dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-avro</artifactId> </dependency> <!-- Spark --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId> </dependency> <!-- Spark (Packages) --> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-avro_2.11</artifactId> <version>4.0.0</version> </dependency> <!-- Hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <scope>provided</scope> </dependency> <!-- Hive --> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-service</artifactId> <version>${hive.version}</version> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-jdbc</artifactId> <version>${hive.version}</version> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-metastore</artifactId> <version>${hive.version}</version> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-common</artifactId> <version>${hive.version}</version> </dependency> ``` If I add the parquet-avro dependency manually, then the client is able to write data but the hive sync fails with a class not found, probably because of the hive dependencies missing.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services