kbendick commented on issue #5453:
URL: https://github.com/apache/iceberg/issues/5453#issuecomment-1207564941

   To get around the issue, assuming you're using `S3FileIO` (which it seems 
like you are), you might consider increasing the number of multipart upload 
threads if the issue is indeed just a plain S3 upload timeout:
   - https://iceberg.apache.org/docs/latest/aws/#progressive-multipart-upload
   
   Other things that might help would be allowing Spark to sort the arrays 
without using the UDFs, but instead using `array_sort` (and possibly using a 
UDF for the comparison logic, but that might not be needed and keep in mind 
that UDFs are always best avoided if Spark's built-in functions can achieve the 
same thing).
   
   Examples of `sort_array` and `array_sort` (including without a UDF or with 
one but only one used for comparison and not whole array sorting): 
https://towardsdatascience.com/the-definitive-way-to-sort-arrays-in-spark-1224f5529961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to