aokolnychyi commented on PR #7430: URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1528008416
> Do you mean you are fine with the current change but against making it correlate to "spark.sql.files.maxPartitionBytes"? Yeah, I don't mind adding an Iceberg SQL property if it benefits you and other folks also support it but I would like to think through other alternatives to make sure we are not overlooking a better approach. I don't think it is a good idea to support any properties for built-in sources, though. Split planning is a bit different. If we support one config, will we have to support others? > this will lead to driver memory consumption increased and cause job driver OOM. To fix that, we will have to increase driver memory manually for that job. Is there a way to set this value correctly during the migration or is the split size different for different workloads? > To fix that, we have to add explicit coalesce to match the behavior. I assume there is no shuffle in that read-write job so that AQE cannot coalesce/split tasks during writes? Let's hear what others think. I am OK to add this property to unblock you but it would be great to explore the automatic split configuration. I created #7465 for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
