jbewing opened a new pull request, #14681:
URL: https://github.com/apache/iceberg/pull/14681

   ### What
   
   This PR updates view creation from Spark 4, 3.5, & 3.4 to analyze, but not 
optimize the view body when creating a view. Previously, the view body would be 
optimized which could result in long view creation times with larger tables. 
When creating views over a larger table (hundreds of TBs), creating a small 
number of views (say just a couple thousand) takes about ~12 hours and requires 
a moderately sized Spark cluster (~100 CPUs). Without running optimization over 
a view body, the view body is still analyzed for invalid syntax or references.
   
   
   ### Testing
   I've run the existing test suite locally for Spark 3.4, 3.5, & 4 to verify 
that they still pass. Additionally, I've run this iceberg patch on an fork of 
Iceberg 1.10.0 on a fork of Spark 3.5 an observed in a staging environment that 
a task which creates some views over a smaller (~10TB) table that used to take 
2 hours now takes 14 minutes consistently. Additionally, no errors or bugs were 
observed with the created views when testing in this staging environment.
   
   
   Issue: https://github.com/apache/iceberg/issues/14680


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to