[GitHub] [hudi] stayrascal commented on issue #8571: [ISSUE] spark-sql doesn't read the latest snapshot of MOR table

via GitHub Tue, 25 Apr 2023 05:27:38 -0700


stayrascal commented on issue #8571:
URL: https://github.com/apache/hudi/issues/8571#issuecomment-1521704245


   After make a deep analysis, it seems that the root cause problem is that the 
spark cache the plan(refer to 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L245-L257).
   
   Once a hudi table has been scanned, the logic plan include file index info 
will be cached, and it will return the cached logic plan directly during scan 
the table next time, but the meta data of hudi table/file index might be 
expired, e.g.  commit a new snapshot, or the commit/snapshot has been clean up.
   
   Not sure if there any workaround can save this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] stayrascal commented on issue #8571: [ISSUE] spark-sql doesn't read the latest snapshot of MOR table

Reply via email to