firecast edited a comment on issue #894: Getting java.lang.NoSuchMethodError 
while doing Hive sync
URL: https://github.com/apache/incubator-hudi/issues/894#issuecomment-533140792
 
 
   > Cant understand where Hive 2.3.2-amzn-2 comes from still from what you 
shared
   
   So the Job connects to a remote Hive 2.3.2-amzn-2 server to create tables 
there. 
   
   @vinothchandar I now tried to pull in the Hudi dependency from the local 
maven repository which was build on the `release-0.5.0-incubating-rc2` tag with 
the command `mvn clean install -DskipTests -DskipITs`
   
   Here is the `sbt.build` file
   ```sbt
   scalaVersion := "2.11.12"
   val sparkVersion = "2.4.3"
   
   resolvers += Resolver.mavenLocal
   
   libraryDependencies ++= Seq(
       "org.scala-lang" % "scala-compiler" % scalaVersion.value % Provided,
   
       "org.apache.spark" %% "spark-core" % sparkVersion % Provided,
       "org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
       "org.apache.spark" %% "spark-hive" % sparkVersion % Provided,
       "org.apache.spark" %% "spark-sql-kafka-0-10" % sparkVersion % Provided,
   
       "com.amazonaws" % "aws-java-sdk-s3" % "1.11.633",
       "org.apache.hadoop" % "hadoop-aws" % "2.8.5",
   
       "org.apache.hudi" % "hudi-spark" % "0.5.0-incubating-rc2"
   )
   
   dependencyOverrides ++= Seq(
       "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7",
       "org.slf4j" % "slf4j-log4j12" % "1.7.28" % Test
   )
   ```
   
   If I look at the dependency graph now, it is missing some transitive 
dependencies.
   ![Screenshot 2019-09-19 at 7 23 40 
PM](https://user-images.githubusercontent.com/2487532/65250335-0a966200-db13-11e9-9ea3-f6fc7e883342.png)
   
   
   The missing dependencies are:
   ```xml
     <dependency>
         <groupId>org.apache.avro</groupId>
         <artifactId>avro</artifactId>
       </dependency>
   
       <!-- Parquet -->
       <dependency>
         <groupId>org.apache.parquet</groupId>
         <artifactId>parquet-avro</artifactId>
       </dependency>
   
       <!-- Spark -->
       <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-core_2.11</artifactId>
       </dependency>
       <dependency>
         <groupId>org.apache.spark</groupId>
         <artifactId>spark-sql_2.11</artifactId>
       </dependency>
   
       <!-- Spark (Packages) -->
       <dependency>
         <groupId>com.databricks</groupId>
         <artifactId>spark-avro_2.11</artifactId>
         <version>4.0.0</version>
       </dependency>
   
       <!-- Hadoop -->
       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-client</artifactId>
         <exclusions>
           <exclusion>
             <groupId>javax.servlet</groupId>
             <artifactId>*</artifactId>
           </exclusion>
         </exclusions>
         <scope>provided</scope>
       </dependency>
       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-common</artifactId>
         <scope>provided</scope>
       </dependency>
   
       <!-- Hive -->
       <dependency>
         <groupId>${hive.groupid}</groupId>
         <artifactId>hive-service</artifactId>
         <version>${hive.version}</version>
       </dependency>
       <dependency>
         <groupId>${hive.groupid}</groupId>
         <artifactId>hive-jdbc</artifactId>
         <version>${hive.version}</version>
       </dependency>
       <dependency>
         <groupId>${hive.groupid}</groupId>
         <artifactId>hive-metastore</artifactId>
         <version>${hive.version}</version>
       </dependency>
       <dependency>
         <groupId>${hive.groupid}</groupId>
         <artifactId>hive-common</artifactId>
         <version>${hive.version}</version>
       </dependency>
   ```
   
   If I add the parquet-avro dependency manually, then the client is able to 
write data but the hive sync fails with a class not found, probably because of 
the hive dependencies missing. I am particularly new to SBT and Maven so I 
might be missing something simple here, but I would have expected all the 
transitive dependencies to be installed automatically.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to