jbewing opened a new pull request, #14681: URL: https://github.com/apache/iceberg/pull/14681
### What This PR updates view creation from Spark 4, 3.5, & 3.4 to analyze, but not optimize the view body when creating a view. Previously, the view body would be optimized which could result in long view creation times with larger tables. When creating views over a larger table (hundreds of TBs), creating a small number of views (say just a couple thousand) takes about ~12 hours and requires a moderately sized Spark cluster (~100 CPUs). Without running optimization over a view body, the view body is still analyzed for invalid syntax or references. ### Testing I've run the existing test suite locally for Spark 3.4, 3.5, & 4 to verify that they still pass. Additionally, I've run this iceberg patch on an fork of Iceberg 1.10.0 on a fork of Spark 3.5 an observed in a staging environment that a task which creates some views over a smaller (~10TB) table that used to take 2 hours now takes 14 minutes consistently. Additionally, no errors or bugs were observed with the created views when testing in this staging environment. Issue: https://github.com/apache/iceberg/issues/14680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
