puchengy commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1526936388
Hi @aokolnychyi,
> I am not going to oppose a SQL config but I don't think we should rely on
an internal SQL property for built-in file sources.
Trying understand your stance here. Do you mean you are fine with the
current change but against making it correlate to
"spark.sql.files.maxPartitionBytes"? If so, I am fine with that.
> Can we identify exact scenarios when the default split size performs
poorly and check if we can solve the underlying problem?
I can share two scenarios, they doesn't really lead to poor performance, but
it made our platform team's life harder ("harder" means making migration work
more challenging).
(1) as mentioned above, when SparkSQL used to consume a Hive table with a
large "spark.sql.files.maxPartitionBytes" value (for example, 1GB), changing
the underlying table to Iceberg (default to 128MB split size) will immediately
increase the split count by 8x (in theory), this will lead to driver memory
consumption increased and cause job driver OOM.
(2) we have a strict SLA we customer, this usually mean when we perform a
change to a SparkSQL job, hopefully we make sure the output are the same
(number of files and size of each files). In the case of Iceberg migration,
when source table is changed from Hive to Iceberg, due to the split count
changes, it will directly increase the SparkSQL job output files by 8x (in
theory). While we can further make a case that the increase is OK, but this is
making the surface of work larger thus slower down the innovation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]