puchengy commented on PR #7430: URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1526199803
Hi @aokolnychyi I think there is legit value for this. We are migrating hundreds of Hive tables to Iceberg. Ensuring the SparkSQL consumers of these tables don't fail is our top priorities. So the SparkSQL job used to read Hive table with some "spark.sql.files.maxPartitionBytes" values will fail if the Iceberg table split size is at huge difference causing more splits to be generated causing job failures. It is even more complicated if different downstream jobs have different "spark.sql.files.maxPartitionBytes" values (I am not sure if this really happens, but in theory it could). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
