BalaMahesh commented on issue #2251:
URL: https://github.com/apache/hudi/issues/2251#issuecomment-726732563
Update : 1 . After adding the additional log statement in
HoodieParquetInputFormat and InputHandler classes, I have found this :
1) [InputInitializer {Map 1} #0] |hadoop.InputPathHandler|: Got the input
paths :
[s3a://xxx/test/hudi/data/xxx/xxx/dt=2020-11-13/.hoodie_partition_metadata,
s3a://xxx/test/hudi/data/xxx/xxx/dt=2020-11-13/4e5582b0-ceb4-4d7c-ab98-bb9dfb0962e6-0_0-17038-5024094_20201113170011.parquet]conf
: Configuration: incrementalTables : []
Query Job has got the input paths as the files inside partition directory
instead of partition directory itself , now Hudi mr bundle is trying to append
metadata filename to these base files and failing to find the metadata file
path .
In the same hive session , query on the different hudi table has the below
logs :
hadoop.InputPathHandler|: Got the input paths :
[s3a://xxxx/test/hudi/data/xxx/xxx/dt=2020-11-13]conf : Configuration:
incrementalTables : [] which is upto partition directory unlike above base
file path, in this case ,partition metadata file is accessible and query is
finishing .
I would need help to figuring out from where job is getting the base files
are inputPath instead of directory, i did describe formatted table
partition(val) on the tables and they both have same directory structure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]