gubinjie opened a new issue #4600:
URL: https://github.com/apache/hudi/issues/4600
The created hudi table table structure:
CREATE EXTERNAL TABLE `guhudi_ro`(
`_hoodie_commit_time` string COMMENT '',
`_hoodie_commit_seqno` string COMMENT '',
`_hoodie_record_key` string COMMENT '',
`_hoodie_partition_path` string COMMENT '',
`_hoodie_file_name` string COMMENT '',
`id` bigint COMMENT '',
`name` string COMMENT '',
`birthday` bigint COMMENT '',
`ts` bigint COMMENT '')
PARTITIONED BY (
`partition` string COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'hoodie.query.as.ro.table'='true',
'path'='hdfs://paat-dev/user/hudi/guhudi')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://paat-dev/user/hudi/guhudi'
TBLPROPERTIES (
'last_commit_time_sync'='20220114154921888',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numPartCols'='1',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"long","nullable":false,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"birthday","type":"timestamp","nullable":true,"metadata":{}},{"name":"ts","type":"timestamp","nullable":true,"metadata":{}},{"name":"partition","type":"string","nullable":true,"metadata":{}}]}',
'spark.sql.sources.schema.partCol.0'='partition',
'transient_lastDdlTime'='1642146555')
But when I go to query this table, he will go to the xxxx (below) path to
find it, so that the data cannot be queried. Why?
hive> select count(1) from guhudi_ro;
Query ID = root_20220114155139_4d8fae2e-3b06-4bdb-94c6-d4da9cdd22c6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
22/01/14 15:51:39 INFO client.RMProxy: Connecting to ResourceManager at
lo-t-bd-nn/172.16.7.55:8032
22/01/14 15:51:39 INFO client.RMProxy: Connecting to ResourceManager at
lo-t-bd-nn/172.16.7.55:8032
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Total number of
paths: 1, launching 1 threads to check non-combinable ones.
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Input Format =>
org.apache.hudi.hadoop.HoodieParquetInputFormat
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat:
CombineHiveInputSplit creating pool for
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7;
using filter path
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat:
mapreduce.input.fileinputformat.split.minsize=1,
mapreduce.input.fileinputformat.split.maxsize=256000000
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Listing status in
HoodieCombineHiveInputFormat.HoodieCombineFileInputFormatShim
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Using
HoodieInputFormat
22/01/14 15:51:39 INFO utils.HoodieInputFormatUtils: Reading hoodie metadata
from path **hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a**
22/01/14 15:51:39 INFO table.HoodieTableMetaClient: Loading
HoodieTableMetaClient from
hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a
22/01/14 15:51:39 INFO hadoop.InputPathHandler: Handling a non-hoodie path
**hdfs://paat-dev/tmp/hive/root/b13bb5f0-aabd-4bac-9d72-596b53cfd41a/hive_2022-01-14_15-51-39_382_6318212808265460380-1/-mr-10003/fb4a19a7-08a2-436d-8af3-eab9a59b7be7**
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: number of splits 1
22/01/14 15:51:39 INFO hive.HoodieCombineHiveInputFormat: Number of all
splits 1
Starting Job = job_1642128573447_0025, Tracking URL =
**http://lo-t-bd-nn:8088/proxy/application_1642128573447_0025/
Kill Command = /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job -kill
job_1642128573447_0025**
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2022-01-14 15:51:44,886 Stage-1 map = 0%, reduce = 0%
2022-01-14 15:51:48,953 Stage-1 map = 100%, reduce = 0%, Cumulative CPU
2.18 sec
2022-01-14 15:51:53,011 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
3.65 sec
MapReduce Total cumulative CPU time: 3 seconds 650 msec
Ended Job = job_1642128573447_0025
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.65 sec HDFS Read:
11212 HDFS Write: 101 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 650 msec
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]