stayrascal commented on issue #8571: URL: https://github.com/apache/hudi/issues/8571#issuecomment-1521704245
After make a deep analysis, it seems that the root cause problem is that the spark cache the plan(refer to https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L245-L257). Once a hudi table has been scanned, the logic plan include file index info will be cached, and it will return the cached logic plan directly during scan the table next time, but the meta data of hudi table/file index might be expired, e.g. commit a new snapshot, or the commit/snapshot has been clean up. Not sure if there any workaround can save this problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
