[GitHub] [hudi] MarlboroBoy opened a new issue, #8838: [SUPPORT] spark sql create COPY_ON_WRITE partitioned table use hive query raise `UnsupportedOperationException '

via GitHub Mon, 29 May 2023 04:23:21 -0700


MarlboroBoy opened a new issue, #8838:
URL: https://github.com/apache/hudi/issues/8838


   
   **Describe the problem you faced**
   
   
   
   When I use the Flink sql client to create a partition table, insert data 
through the insert statement, and use select * to query data with the hive 
beeline client, an exception will be thrown and an error will be reported
   
   The error message is
   
   Java. lang. UnsupportedOperationException: org. apache. hadoop. hive. ql. 
io. queue. convert. ETypeConverter
   
   
   But there's no problem with using select column name
   
   
   I want to know if it's due to compilation issues or package conflicts. Can 
you help me solve this problem
   
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1../bin/sql-client.sh embedded -j /tmp/hudi-flink1.15-bundle-0.13.1.jar shell
   2.
   ```shell
   REATE TABLE t7(
     uuid VARCHAR(20),
     name VARCHAR(10),
     age INT,
     ts TIMESTAMP(3),
     `partition` VARCHAR(20)
   )
   PARTITIONED BY (`partition`)
   WITH (
     'connector' = 'hudi',
     'path' = '/warehouse/tablespace/external/hive/test.db/t7',
     'table.type' = 'COPY_ON_WRITE',  -- If MERGE_ON_READ, hive query will not 
have output until the parquet file is generated
     'hive_sync.enable' = 'true',     -- Required. To enable hive 
synchronization
     'hive_sync.mode' = 'hms',      -- Required. Setting hive sync mode to hms, 
default jdbc
     'hive_sync.metastore.uris' = 'thrift://node200.xxx.com:9083' -- Required. 
The port need set on hive-site.xml
   );
   
   
   **INSERT INTO t7 VALUES
     ('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),
     ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
     ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
     ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
     ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
     ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
     ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
     ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');**
   ```
   3.
   ```shell
   beeline > select * from t7
   
   Error: java.io.IOException: org.apache.parquet.io.ParquetDecodingException: 
Can not read value at 1 in block 0 in file 
hdfs://node200.zetyun.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
 (state=,code=0)
   ```
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.13.1
   
   * Spark version: 2.4.7.7.1.7.0-551
   
   * Hive version :3.1.3000.7.1.7.0-551
   
   * Hadoop version :3.1.1.7.1.7.0-551
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) ： no
   
   
   **Additional context**
   
   
![image](https://github.com/apache/hudi/assets/31274802/dfb9e0a7-bbab-4470-b84d-e0ab3836b838)
   
   **Stacktrace**
   
   ```shell
   HiveServer2-Handler-Pool: Thread-105]: Error fetching results: 
   org.apache.hive.service.cli.HiveSQLException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in 
block 0 in file 
hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:476)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:946)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:567) 
~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:798)
 [hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
 [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
 [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 [hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 [hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_232]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_232]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
   Caused by: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in 
block 0 in file 
hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:638) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        ... 13 more
   Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 1 in block 0 in file 
hdfs://node200.xxx.com:8020/warehouse/tablespace/external/hive/test.db/t13/par1/22d15c8b-8ca5-4071-8ae2-e7c0754b2c56_0-1-0_20230529172155679.parquet
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReaderInternal(HoodieParquetInputFormat.java:97)
 ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:91)
 ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:810)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:365)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        ... 13 more
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter$10$1
        at 
org.apache.parquet.io.api.PrimitiveConverter.addLong(PrimitiveConverter.java:105)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.column.impl.ColumnReaderBase$2$4.writeValue(ColumnReaderBase.java:301)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.column.impl.ColumnReaderBase.writeCurrentValueToConverter(ColumnReaderBase.java:410)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.writeCurrentValueToConverter(ColumnReaderImpl.java:30)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:406)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:226)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:98)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:60)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:93)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReaderInternal(HoodieParquetInputFormat.java:97)
 ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
        at 
org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:91)
 ~[hudi-hadoop-mr-bundle-0.13.1.jar:0.13.1]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:810)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:365)
 ~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:576) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:901) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:243) 
~[hive-exec-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:471)
 ~[hive-service-3.1.3000.7.1.7.0-551.jar:3.1.3000.7.1.7.0-551]
        ... 13 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] MarlboroBoy opened a new issue, #8838: [SUPPORT] spark sql create COPY_ON_WRITE partitioned table use hive query raise `UnsupportedOperationException '

Reply via email to