Schema;

GitBox Sat, 08 Oct 2022 02:11:25 -0700


gubinjie opened a new issue, #6894:
URL: https://github.com/apache/hudi/issues/6894


   CDH 6.3.2
   Hudi 0.10.1
   
   When querying a Hudi table through Hive, I get the following error:
   select * from hudi_flink_tyc_company_rt where name = '3213'
   
   `2022-10-08 16:30:27,365 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. 
Instead, use dfs.metrics.session-id
   2022-10-08 16:30:27,661 INFO [main] org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorProcessTree : [ ]
   2022-10-08 16:30:27,819 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: 
HoodieCombineRealtimeFileSplit{realtimeFileSplits=[HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
 
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0],
 maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/6f026a25-797e-4a8b-9382-b426b94fd034_0-1-0_20220929181835942.parquet,
 
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.6f026a25-797e-4a8b-9382-b426b94fd034_20220929181835942.log.1_0-1-0],
 maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehou
 
se/paat_ods_hudi.db/4f90e72d-d205-4640-975f-09ebb2ad136a_0-1-0_20220929180105887.parquet,
 deltaLogPaths=[], maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}, 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/b72b41e5-7bd9-4a87-a91d-86a368a2f7b7_0-1-0_20220929181835942.parquet,
 deltaLogPaths=[], maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}]}InputFormatClass:
 org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat
   
   2022-10-08 16:30:27,873 INFO [main] org.apache.hadoop.hive.conf.HiveConf: 
Found configuration file null
   2022-10-08 16:30:27,980 INFO [main] 
org.apache.hadoop.hive.ql.exec.SerializationUtilities: Deserializing MapWork 
using kryo
   2022-10-08 16:30:28,110 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Before adding 
Hoodie columns, Projections 
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
 Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
   2022-10-08 16:30:28,110 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat: Creating 
record reader with readCols 
:_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time,
 Ids :0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
   2022-10-08 16:30:28,361 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.task.id is deprecated. 
Instead, use mapreduce.task.attempt.id
   2022-10-08 16:30:28,366 ERROR [main] 
org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
   2022-10-08 16:30:28,390 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized 
will read a total of 44225 records.
   2022-10-08 16:30:28,390 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next 
block
   2022-10-08 16:30:28,412 INFO [main] 
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & 
initialized native-zlib library
   2022-10-08 16:30:28,413 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.gz]
   2022-10-08 16:30:28,418 INFO [main] 
org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 
28 ms. row count = 44225
   2022-10-08 16:30:28,565 INFO [main] 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader: Enabling merged 
reading of realtime records for split 
HoodieRealtimeFileSplit{DataPath=hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/5652dad0-9e32-43f5-99c4-eff0a89c6a79_0-1-0_20220929181835942.parquet,
 
deltaLogPaths=[hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.5652dad0-9e32-43f5-99c4-eff0a89c6a79_20220929181835942.log.1_0-1-0],
 maxCommitTime='20220929190955221', 
basePath='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'}
   2022-10-08 16:30:28,566 INFO [main] 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: cfg ==> 
_hoodie_commit_time,_hoodie_commit_seqno,_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,company_id,company_name,legal_person_name,establish_time,reg_capital,reg_status,reg_number,org_number,credit_code,reg_location,phone_num,province_code,city_code,district_code,province,city,district,company_type,tax_code,category_code_std,social_security_staff_num,update_time
   2022-10-08 16:30:28,566 INFO [main] 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: columnIds ==> 
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
   2022-10-08 16:30:28,566 INFO [main] 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: 
partitioningColumns ==> 
   2022-10-08 16:30:28,574 INFO [main] 
org.apache.hudi.common.table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
   2022-10-08 16:30:28,586 INFO [main] 
org.apache.hudi.common.table.HoodieTableConfig: Loading table properties from 
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.hoodie/hoodie.properties
   2022-10-08 16:30:28,589 INFO [main] 
org.apache.hudi.common.table.HoodieTableMetaClient: Finished Loading Table of 
type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
   2022-10-08 16:30:28,590 INFO [main] 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader: usesCustomPayload 
==> true
   2022-10-08 16:30:28,590 INFO [main] 
org.apache.hudi.common.table.HoodieTableMetaClient: Loading 
HoodieTableMetaClient from hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
   2022-10-08 16:30:28,592 INFO [main] 
org.apache.hudi.common.table.HoodieTableConfig: Loading table properties from 
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/.hoodie/hoodie.properties
   2022-10-08 16:30:28,594 INFO [main] 
org.apache.hudi.common.table.HoodieTableMetaClient: Finished Loading Table of 
type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db
   2022-10-08 16:30:28,609 ERROR [main] org.apache.hadoop.mapred.YarnChild: 
Error running child : java.lang.NoSuchMethodError: 
org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
        at 
org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
        at 
org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
        at 
org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
        at 
org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
        at 
org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.<init>(AbstractRealtimeRecordReader.java:67)
        at 
org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.<init>(RealtimeCompactedRecordReader.java:62)
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
        at 
org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.<init>(HoodieRealtimeRecordReader.java:47)
        at 
org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:323)
        at 
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat$HoodieCombineFileInputFormatShim.getRecordReader(HoodieCombineHiveInputFormat.java:974)
        at 
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:555)
        at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
   
   2022-10-08 16:30:28,713 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
system...
   2022-10-08 16:30:28,714 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
stopped.
   2022-10-08 16:30:28,714 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
shutdown complete.`
   
   Hive table structure:
   CREATE EXTERNAL TABLE `paat_ods_hudi.paat_hudi_flink_tyc_company_rt`(
     `_hoodie_commit_time` string COMMENT '',
     `_hoodie_commit_seqno` string COMMENT '',
     `_hoodie_record_key` string COMMENT '',
     `_hoodie_partition_path` string COMMENT '',
     `_hoodie_file_name` string COMMENT '',
     `company_id` string COMMENT '',
     `company_name` string COMMENT '',
     `legal_person_name` string COMMENT '',
     `establish_time` string COMMENT '',
     `reg_capital` string COMMENT '',
     `reg_status` string COMMENT '',
     `reg_number` string COMMENT '',
     `org_number` string COMMENT '',
     `credit_code` string COMMENT '',
     `reg_location` string COMMENT '',
     `phone_num` string COMMENT '',
     `province_code` string COMMENT '',
     `city_code` string COMMENT '',
     `district_code` string COMMENT '',
     `province` string COMMENT '',
     `city` string COMMENT '',
     `district` string COMMENT '',
     `company_type` string COMMENT '',
     `tax_code` string COMMENT '',
     `category_code_std` string COMMENT '',
     `social_security_staff_num` string COMMENT '',
     `update_time` bigint COMMENT '')
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   WITH SERDEPROPERTIES (
     'hoodie.query.as.ro.table'='false',
     'path'='hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db/')
   STORED AS INPUTFORMAT
     'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://paat-dev/user/hudi/warehouse/paat_ods_hudi.db'
   TBLPROPERTIES (
     'last_commit_time_sync'='20220929181835978',
     'spark.sql.sources.provider'='hudi',
     'spark.sql.sources.schema.numParts'='1',
     
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":
 
true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key",
 
"type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{
 
}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"company_id","type":"string"
 
,"nullable":false,"metadata":{}},{"name":"company_name","type":"string","nullable":true,"metadata":{}},{"name"
 
:"legal_person_name","type":"string","nullable":true,"metadata":{}},{"name":"establish_time","type":"string","nullable":true,
 
"metadata":{}},{"name":"reg_capital","type":"string","nullable":true,"metadata":{}},{"name":"reg_status","type
 
":"string","nullable":true,"metadata":{}},{"name":"reg_number","type":"string","nullable":true,"metadata":{}}
 ,{"name":"org_number","type":"string","nullable
 ":true,"metadat 
a":{}},{"name":"credit_code","type":"string","nullable":true,"metadata":{}},{"name":"reg_location","type"
 
:"string","nullable":true,"metadata":{}},{"name":"phone_num","type":"string","nullable":true,"metadata":{}},
 
{"name":"province_code","type":"string","nullable":true,"metadata":{}},{"name":"city_code","type":"string","nullable
 
":true,"metadata":{}},{"name":"district_code","type":"string","nullable":true,"metadata":{}},{"name":"province
 
","type":"string","nullable":true,"metadata":{}},{"name":"city","type":"string","nullable":true,"metadata"
 
:{}},{"name":"district","type":"string","nullable":true,"metadata":{}},{"name":"company_type","type":"
 
string","nullable":true,"metadata":{}},{"name":"tax_code","type":"string","nullable":true,"metadata":{}},{"
 
name":"category_code_std","type":"string","nullable":true,"metadata":{}},{"name":"social_security_staff_num","type":"string","nullable":
 
true,"metadata":{}},{"name":"update_time","type":"timestamp","nullable":true,"metad
 ata":{}}]}',
     'transient_lastDdlTime'='1664444316')
   
   I have put the hudi-hadoop-mr-bundle-0.10.1.jar package into the 
/etc/hive/auxlib directory of Hive, may I ask if the Jar package is still 
missing?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] gubinjie opened a new issue, #6894: [SUPPORT]Error running child : java.lang.NoSuchMethodError: org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;

Reply via email to