umehrot2 commented on issue #1798:
URL: https://github.com/apache/hudi/issues/1798#issuecomment-656320283


   @zherenyu831 yes I am also confused by the difference in number of files in 
the two experiments you have provided. Are both these queries on the same 
dataset and have same number of files underneath ?
   
   Regardless, the listing happens internally through Spark's `parquet` data 
source. The only difference is Hudi passes `HoodieROTablePathFilter` to spark's 
implementation to list only the latest files. At this point I don't understand 
why that would cause difference in these two queries which you have mentioned, 
but we would be happy to look into it.
   
   Can you provide a snapshot of your Spark history server showing the 
difference in time in Spark's listing for these two queries on the same table ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to