cb149 commented on issue #4830:
URL: https://github.com/apache/hudi/issues/4830#issuecomment-1048874783


   > `SHOW TABLE STATS myTable ; shows 0 files and size 0B for that 
partition.`. -> This makes me wonder if the partition is somehow not 
registered. Since this is an external table, we can check if the data is 
present physically by checking the partition path. Does Impala depend on Hive 
metastore to gather this stats or does it do on its own ? If first, we need to 
check the HMS to see if this partition is registered with the table.
   
   The weird part is that the partition shows up every day prior, e.g. today, I 
can see data for day 23,22,21,19,18 etc.
   Tonight when `day=21` gets cleaned, `day=20` will show up and `day=21` will 
be missing from Impala for the next 24 hours.
   
   So the data is present physically and registered, plus there are no issues 
when I query the data with Spark or `spark.sql("select day from myTable where 
year=2022 and month = 2 group by day ORDER BY day asc")` (though using 
spark.sql needs  hudi-spark-bundle, it still reads the data as parquet and 
returns duplicates)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to