gubinjie opened a new issue, #6894:
URL: https://github.com/apache/hudi/issues/6894
CDH 6.3.2
Hudi 0.10.1
When querying a Hudi table through Hive, I get the following error:
select * from hudi_flink_tyc_company_rt where name = '3213'
`2022-10-08 16:30:27,365 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated.
Instead, use dfs.metrics.session-id
2022-10-08 16:30:27,661 INFO [main] org.apache.hadoop.mapred.Task: Using
ResourceCalculatorProcessTree : [ ]
2022-10-08 16:30:27,819 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split:
HoodieCombineRealtimeFileSplit{realtimeFileSplits=[HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0],
maxCommitTime='20220929190955221',
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'},
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/6f026a25-797e-4a8b-9382-b426b94fd034_0-1-0_20220929181835942.parquet,
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.6f026a25-797e-4a8b-9382-b426b94fd034_20220929181835942.log.1_0-1-0],
maxCommitTime='20220929190955221',
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'},
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehou
se/paat_ods_hudi.db/4f90e72d-d205-4640-975f-09ebb2ad136a_0-1-0_20220929180105887.parquet,
deltaLogPaths=[], maxCommitTime='20220929190955221',
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'},
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/b72b41e5-7bd9-4a87-a91d-86a368a2f7b7_0-1-0_20220929181835942.parquet,
deltaLogPaths=[], maxCommitTime='20220929190955221',
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}]}InputFormatClass:
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
2022-10-08 16:30:27,873 INFO [main] org.apache.hadoop.hive.conf.HiveConf:
Found configuration file null
2022-10-08 16:30:27,980 INFO [main]
org.apache.hadoop.hive.ql.exec.SerializationUtilities: Deserializing MapWork
using kryo
2022-10-08 16:30:28,110 INFO [main]
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Before adding
Hoodie columns, Projections
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
2022-10-08 16:30:28,110 INFO [main]
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Creating
record reader with readCols
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
2022-10-08 16:30:28,361 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
2022-10-08 16:30:28,366 ERROR [main]
org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due
to context is not a instance of TaskInputOutputContext, but is
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2022-10-08 16:30:28,390 INFO [main]
org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized
will read a total of 44225 records.
2022-10-08 16:30:28,390 INFO [main]
org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next
block
2022-10-08 16:30:28,412 INFO [main]
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded &
initialized native-zlib library
2022-10-08 16:30:28,413 INFO [main] org.apache.hadoop.io.compress.CodecPool:
Got brand-new decompressor [.gz]
2022-10-08 16:30:28,418 INFO [main]
org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in
28 ms. row count = 44225
2022-10-08 16:30:28,565 INFO [main]
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader: Enabling merged
reading of realtime records for split
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0],
maxCommitTime='20220929190955221',
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}
2022-10-08 16:30:28,566 INFO [main]
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: cfg ==>
_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time
2022-10-08 16:30:28,566 INFO [main]
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: columnIds ==>
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
2022-10-08 16:30:28,566 INFO [main]
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader:
partitioningColumns ==>
2022-10-08 16:30:28,574 INFO [main]
org.apache.hudi.common.table.HoodieTableMetaClient: Loading
HoodieTableMetaClient from hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
2022-10-08 16:30:28,586 INFO [main]
org.apache.hudi.common.table.HoodieTableConfig: Loading table properties from
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.hoodie/hoodie.properties
2022-10-08 16:30:28,589 INFO [main]
org.apache.hudi.common.table.HoodieTableMetaClient: Finished Loading Table of
type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
2022-10-08 16:30:28,590 INFO [main]
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: usesCustomPayload
==> true
2022-10-08 16:30:28,590 INFO [main]
org.apache.hudi.common.table.HoodieTableMetaClient: Loading
HoodieTableMetaClient from hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
2022-10-08 16:30:28,592 INFO [main]
org.apache.hudi.common.table.HoodieTableConfig: Loading table properties from
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.hoodie/hoodie.properties
2022-10-08 16:30:28,594 INFO [main]
org.apache.hudi.common.table.HoodieTableMetaClient: Finished Loading Table of
type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
2022-10-08 16:30:28,609 ERROR [main] org.apache.hadoop.mapred.YarnChild:
Error running child : java.lang.NoSuchMethodError:
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
at
org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
at
org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
at
org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
at
org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
at
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
at
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:67)
at
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:62)
at
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
at
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
at
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:323)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:974)
at
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:555)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
2022-10-08 16:30:28,713 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics
system...
2022-10-08 16:30:28,714 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
stopped.
2022-10-08 16:30:28,714 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
shutdown complete.`
Hive table structure:
CREATE EXTERNAL TABLE `paat_ods_hudi.paat_hudi_flink_tyc_company_rt`(
`_hoodie_commit_time` string COMMENT '',
`_hoodie_commit_seqno` string COMMENT '',
`_hoodie_record_key` string COMMENT '',
`_hoodie_partition_path` string COMMENT '',
`_hoodie_file_name` string COMMENT '',
`company_id` string COMMENT '',
`company_name` string COMMENT '',
`legal_person_name` string COMMENT '',
`establish_time` string COMMENT '',
`reg_capital` string COMMENT '',
`reg_status` string COMMENT '',
`reg_number` string COMMENT '',
`org_number` string COMMENT '',
`credit_code` string COMMENT '',
`reg_location` string COMMENT '',
`phone_num` string COMMENT '',
`province_code` string COMMENT '',
`city_code` string COMMENT '',
`district_code` string COMMENT '',
`province` string COMMENT '',
`city` string COMMENT '',
`district` string COMMENT '',
`company_type` string COMMENT '',
`tax_code` string COMMENT '',
`category_code_std` string COMMENT '',
`social_security_staff_num` string COMMENT '',
`update_time` bigint COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'hoodie.query.as.ro.table'='false',
'path'='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/')
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'
TBLPROPERTIES (
'last_commit_time_sync'='20220929181835978',
'spark.sql.sources.provider'='hudi',
'spark.sql.sources.schema.numParts'='1',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":
true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key",
"type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{
}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"company_id","type":"string"
,"nullable":false,"metadata":{}},{"name":"company_name","type":"string","nullable":true,"metadata":{}},{"name"
:"legal_person_name","type":"string","nullable":true,"metadata":{}},{"name":"establish_time","type":"string","nullable":true,
"metadata":{}},{"name":"reg_capital","type":"string","nullable":true,"metadata":{}},{"name":"reg_status","type
":"string","nullable":true,"metadata":{}},{"name":"reg_number","type":"string","nullable":true,"metadata":{}}
,{"name":"org_number","type":"string","nullable
":true,"metadat
a":{}},{"name":"credit_code","type":"string","nullable":true,"metadata":{}},{"name":"reg_location","type"
:"string","nullable":true,"metadata":{}},{"name":"phone_num","type":"string","nullable":true,"metadata":{}},
{"name":"province_code","type":"string","nullable":true,"metadata":{}},{"name":"city_code","type":"string","nullable
":true,"metadata":{}},{"name":"district_code","type":"string","nullable":true,"metadata":{}},{"name":"province
","type":"string","nullable":true,"metadata":{}},{"name":"city","type":"string","nullable":true,"metadata"
:{}},{"name":"district","type":"string","nullable":true,"metadata":{}},{"name":"company_type","type":"
string","nullable":true,"metadata":{}},{"name":"tax_code","type":"string","nullable":true,"metadata":{}},{"
name":"category_code_std","type":"string","nullable":true,"metadata":{}},{"name":"social_security_staff_num","type":"string","nullable":
true,"metadata":{}},{"name":"update_time","type":"timestamp","nullable":true,"metad
ata":{}}]}',
'transient_lastDdlTime'='1664444316')
I have put the hudi-hadoop-mr-bundle-0.10.1.jar package into the
/etc/hive/auxlib directory of Hive, may I ask if the Jar package is still
missing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]