[GitHub] [hudi] gubinjie opened a new issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong

GitBox Fri, 14 Jan 2022 00:09:13 -0800


gubinjie opened a new issue #4600:
URL: https://github.com/apache/hudi/issues/4600



   The created hudi table table structure:
   CREATE EXTERNAL TABLE `guhudi_ro`(
     `_hoodie_commit_time` string COMMENT '', 
     `_hoodie_commit_seqno` string COMMENT '', 
     `_hoodie_record_key` string COMMENT '', 
     `_hoodie_partition_path` string COMMENT '', 
     `_hoodie_file_name` string COMMENT '', 
     `id` bigint COMMENT '', 
     `name` string COMMENT '', 
     `birthday` bigint COMMENT '', 
     `ts` bigint COMMENT '')
   PARTITIONED BY ( 
     `partition` string COMMENT '')
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   WITH SERDEPROPERTIES ( 
     'hoodie.query.as.ro.table'='true', 
     'path'='hdfs://paat-dev/user/hudi/guhudi') 
   STORED AS INPUTFORMAT 
     'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://paat-dev/user/hudi/guhudi'
   TBLPROPERTIES (
     'last_commit_time_sync'='20220114154921888', 
     'spark.sql.sources.provider'='hudi', 
     'spark.sql.sources.schema.numPartCols'='1', 
     'spark.sql.sources.schema.numParts'='1', 
     
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"birthday","type":"timestamp","nullable":true,"metadata":{}},{"name":"ts","type":"timestamp","nullable":true,"metadata":{}},{"name":"partition","type":"string","nullable":true,"metadata":{}}]}',
 
     'spark.sql.sources.schema.partCol.0'='partition', 
     'transient_lastDdlTime'='1642146555')
   
   But when I go to query this table, he will go to the xxxx (below) path to 
find it, so that the data cannot be queried. Why?
   
   hive> select count(1) from guhudi_ro;
   Query ID = root_20220114155139_4d8fae2e-3b06-4bdb-94c6-d4da9cdd22c6
   Total jobs = 1
   Launching Job 1 out of 1
   Number of reduce tasks determined at compile time: 1
   In order to change the average load for a reducer (in bytes):
     set hive.exec.reducers.bytes.per.reducer=<number>
   In order to limit the maximum number of reducers:
     set hive.exec.reducers.max=<number>
   In order to set a constant number of reducers:
     set mapreduce.job.reduces=<number>
   22/01/14 15:51:39 INFO client.RMProxy: Connecting to ResourceManager at 
lo-t-bd-nn/172.16.7.55:8032
   22/01/14 15:51:39 INFO client.RMProxy: Connecting to ResourceManager at 
lo-t-bd-nn/172.16.7.55:8032
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Total number of 
paths: 1, launching 1 threads to check non-combinable ones.
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Input Format => 
org.apache.hudi.hadoop.HoodieParquetInputFormat
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: 
CombineHiveInputSplit creating pool for 
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7;
 using filter path 
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: 
mapreduce.input.fileinputformat.split.minsize=1, 
mapreduce.input.fileinputformat.split.maxsize=256000000
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Listing status in 
HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Using 
HoodieInputFormat
   22/01/14 15:51:39 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata 
from path **hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a**
   22/01/14 15:51:39 INFO table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from 
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a
   22/01/14 15:51:39 INFO hadoop.InputPathHandler: Handling a non-hoodie path 
**hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7**
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: number of splits 1
   22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Number of all 
splits 1
   Starting Job = job_1642128573447_0025, Tracking URL = 
**http://lo-t-bd-nn:8088/proxy/application_1642128573447_0025/
   Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -kill 
job_1642128573447_0025**
   Hadoop job information for Stage-1: number of mappers: 1; number of 
reducers: 1
   2022-01-14 15:51:44,886 Stage-1 map = 0%,  reduce = 0%
   2022-01-14 15:51:48,953 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 
2.18 sec
   2022-01-14 15:51:53,011 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
3.65 sec
   MapReduce Total cumulative CPU time: 3 seconds 650 msec
   Ended Job = job_1642128573447_0025
   MapReduce Jobs Launched: 
   Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.65 sec   HDFS Read: 
11212 HDFS Write: 101 HDFS EC Read: 0 SUCCESS
   Total MapReduce CPU Time Spent: 3 seconds 650 msec
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] gubinjie opened a new issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong

Reply via email to