[ 
https://issues.apache.org/jira/browse/HUDI-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cdmikechen updated HUDI-467:
----------------------------
    Description: 
When creating a *MERGE_ON_READ* table in hudi and syn to hive, hudi will create 
two table named *table_name* and *table_name_rt*, when I query *table_name_rt*, 
I catch  *java.lang.NoClassDefFoundError* Exception:

{code}
java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
org/apache/parquet/avro/AvroSchemaConverter
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 ~[hive-service-2.3.3.jar:2.3.3]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_201]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_201]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 ~[hadoop-common-2.8.5.jar:?]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 ~[hive-service-2.3.3.jar:2.3.3]
        at com.sun.proxy.$Proxy47.fetchResults(Unknown Source) ~[?:?]
        at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) 
~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_201]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_201]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: java.lang.NoClassDefFoundError: 
org/apache/parquet/avro/AvroSchemaConverter
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:341)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:108)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:50)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:69)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:254)
 ~[?:?]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878)
 ~[hive-service-2.3.3.jar:2.3.3]
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_201]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 ~[hive-service-2.3.3.jar:2.3.3]
        ... 18 more
{code}
I checked hive lib folder and don't found parquet-avro jar. At that time I 
thought we need to import parquet-avro dependencie in *hudi-hive-bundle* 
pom.xml. So I checked this pom.xml and found that although parquet-avro have 
been configured in 
{code}
 <artifactSet>
  <includes>
    <include>org.apache.hudi:hudi-common</include>
    <include>org.apache.hudi:hudi-hadoop-mr</include>
    <include>org.apache.hudi:hudi-hive</include>
    <include>com.beust:jcommander</include>
    <include>org.apache.parquet:parquet-avro</include>
    <include>com.esotericsoftware:kryo-shaded</include>
    <include>org.objenesis:objenesis</include>
    <include>com.esotericsoftware:minlog</include>
  </includes>
</artifactSet>
{code}
But parquet-avro can not be packaged to hudi-hive-bundle.jar. Maybe we need to 
add dependency in dependencies.

  was:
When creating a *MERGE_ON_READ* table in hudi and syn to hive, hudi will create 
two table named *table_name* and *table_name_rt*, when I query *table_name_rt*, 
I catch  *java.lang.NoClassDefFoundError* Exception:

{code}
java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
org/apache/parquet/avro/AvroSchemaConverter
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
 ~[hive-service-2.3.3.jar:2.3.3]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_201]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_201]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 ~[hadoop-common-2.8.5.jar:?]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
 ~[hive-service-2.3.3.jar:2.3.3]
        at com.sun.proxy.$Proxy47.fetchResults(Unknown Source) ~[?:?]
        at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) 
~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_201]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_201]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
Caused by: java.lang.NoClassDefFoundError: 
org/apache/parquet/avro/AvroSchemaConverter
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:341)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:108)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:50)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:69)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
 ~[?:?]
        at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:254)
 ~[?:?]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
 ~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) 
~[hive-exec-2.3.3.jar:2.3.3]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) 
~[hive-exec-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307)
 ~[hive-service-2.3.3.jar:2.3.3]
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878)
 ~[hive-service-2.3.3.jar:2.3.3]
        at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_201]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
 ~[hive-service-2.3.3.jar:2.3.3]
        ... 18 more
{code}
I checked hive lib folder and don't found parquet-avro jar. At that time I 
thought we need to import parquet-avro dependencie in *hudi-hive-bundle* 
pom.xml. So I check this pom.xml and found that although parquet-avro have been 
configured in 
{code}
 <artifactSet>
  <includes>
    <include>org.apache.hudi:hudi-common</include>
    <include>org.apache.hudi:hudi-hadoop-mr</include>
    <include>org.apache.hudi:hudi-hive</include>
    <include>com.beust:jcommander</include>
    <include>org.apache.parquet:parquet-avro</include>
    <include>com.esotericsoftware:kryo-shaded</include>
    <include>org.objenesis:objenesis</include>
    <include>com.esotericsoftware:minlog</include>
  </includes>
</artifactSet>
{code}
But parquet-avro can not be packaged to hudi-hive-bundle.jar. Maybe we need to 
add dependency in dependencies.


> Query RT Table in Hive found java.lang.NoClassDefFoundError Exception
> ---------------------------------------------------------------------
>
>                 Key: HUDI-467
>                 URL: https://issues.apache.org/jira/browse/HUDI-467
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: cdmikechen
>            Assignee: cdmikechen
>            Priority: Major
>
> When creating a *MERGE_ON_READ* table in hudi and syn to hive, hudi will 
> create two table named *table_name* and *table_name_rt*, when I query 
> *table_name_rt*, I catch  *java.lang.NoClassDefFoundError* Exception:
> {code}
> java.lang.RuntimeException: java.lang.NoClassDefFoundError: 
> org/apache/parquet/avro/AvroSchemaConverter
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:89)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_201]
>       at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_201]
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  ~[hadoop-common-2.8.5.jar:?]
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at com.sun.proxy.$Proxy47.fetchResults(Unknown Source) ~[?:?]
>       at 
> org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559) 
> ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_201]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_201]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/parquet/avro/AvroSchemaConverter
>       at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:341)
>  ~[?:?]
>       at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:108)
>  ~[?:?]
>       at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:50)
>  ~[?:?]
>       at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:69)
>  ~[?:?]
>       at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
>  ~[?:?]
>       at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:254)
>  ~[?:?]
>       at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>  ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>       at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>       at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) 
> ~[hive-exec-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at 
> org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_201]
>       at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_201]
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
>  ~[hive-service-2.3.3.jar:2.3.3]
>       ... 18 more
> {code}
> I checked hive lib folder and don't found parquet-avro jar. At that time I 
> thought we need to import parquet-avro dependencie in *hudi-hive-bundle* 
> pom.xml. So I checked this pom.xml and found that although parquet-avro have 
> been configured in 
> {code}
>  <artifactSet>
>   <includes>
>     <include>org.apache.hudi:hudi-common</include>
>     <include>org.apache.hudi:hudi-hadoop-mr</include>
>     <include>org.apache.hudi:hudi-hive</include>
>     <include>com.beust:jcommander</include>
>     <include>org.apache.parquet:parquet-avro</include>
>     <include>com.esotericsoftware:kryo-shaded</include>
>     <include>org.objenesis:objenesis</include>
>     <include>com.esotericsoftware:minlog</include>
>   </includes>
> </artifactSet>
> {code}
> But parquet-avro can not be packaged to hudi-hive-bundle.jar. Maybe we need 
> to add dependency in dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to