[GitHub] [iceberg] puchengy commented on pull request #7430: Allow sparksql to override target split size with session property

via GitHub Thu, 27 Apr 2023 12:08:08 -0700


puchengy commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1526199803


   Hi @aokolnychyi I think there is legit value for this.
   
   We are migrating hundreds of Hive tables to Iceberg. Ensuring the SparkSQL 
consumers of these tables don't fail is our top priorities. So the SparkSQL job 
used to read Hive table with some "spark.sql.files.maxPartitionBytes" values 
will fail if the Iceberg table split size is at huge difference causing more 
splits to be generated causing job failures.
   
   It is even more complicated if different downstream jobs have different 
"spark.sql.files.maxPartitionBytes" values (I am not sure if this really 
happens, but in theory it could).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] puchengy commented on pull request #7430: Allow sparksql to override target split size with session property

Reply via email to