alexeykudinkin opened a new pull request, #7847:
URL: https://github.com/apache/hudi/pull/7847

   ### Change Logs
   
   Currently, after most DML operations in Spark SQL, Hudi invokes 
`Catalog.refreshTable`
   
   Prior to Spark 3.2, this was essentially doing the following:
   
   1. Invalidating relation cache (forcing next time for relation to be 
re-resolved, creating new FileIndex, listing files, etc)
   2. Trigger cascading invalidation (re-caching) of the cached data (in 
CacheManager)
   
   As of Spark 3.2 it now additionally does `LogicalRelation.refresh` for ALL 
tables (previously this was only done for Temporary Views), therefore entailing 
    - Whole table to be re-listed again by triggering `FileIndex.refresh` which 
might be costly operation.
   
   This change attempts to revert back to previous behavior from Spark 3.1, 
where `LogicalRelation` are not refreshed until they are actually read again.
   
   ### Impact
   
   Avoiding costly `FileIndex.refresh` operations on the write path
   
   ### Risk level (write none, low medium or high below)
   
   High (we need to carefully explore repercussions of adjusting refreshing 
behavior on all Spark versions)
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to