cb149 commented on issue #4830:
URL: https://github.com/apache/hudi/issues/4830#issuecomment-1048874783
> `SHOW TABLE STATS myTable ; shows 0 files and size 0B for that
partition.`. -> This makes me wonder if the partition is somehow not
registered. Since this is an external table, we can check if the data is
present physically by checking the partition path. Does Impala depend on Hive
metastore to gather this stats or does it do on its own ? If first, we need to
check the HMS to see if this partition is registered with the table.
The weird part is that the partition shows up every day prior, e.g. today, I
can see data for day 23,22,21,19,18 etc.
Tonight when `day=21` gets cleaned, `day=20` will show up and `day=21` will
be missing from Impala for the next 24 hours.
So the data is present physically and registered, plus there are no issues
when I query the data with Spark or `spark.sql("select day from myTable where
year=2022 and month = 2 group by day ORDER BY day asc")` (though using
spark.sql needs hudi-spark-bundle, it still reads the data as parquet and
returns duplicates)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]