Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

via GitHub Thu, 12 Mar 2026 13:59:53 -0700


RussellSpitzer commented on PR #15150:
URL: https://github.com/apache/iceberg/pull/15150#issuecomment-4050029493


   I think we've tried to talk about this a bit before, but the main concern we 
have is changing existing classes and adding additional ones in order to 
essentially carry an integer through the pipeline.
   
   We are always attempting to make the least invasive changes possible and I 
think in this case we are modifying a class with a clear purpose (carry spark 
specific requirements which change how Spark performs the write) with an 
iceberg concern (how do we annotate the files produced).
   
   This is not a performance issue but an architectural one in the codebase. 
That's the kind of thing @aokolnychyi and I are mentioning. This is a pretty 
opinionated project on that sort of thing. I know this feels a little arbitrary 
but it is something we take pretty seriously.
   
   Ideally, we keep these concerns separate unless there is a really strong 
argument as to why they belong together. 
   
   I'm glad to keep working with you on this PR but we need to pass the Iceberg 
information in its own class/pathway/container or just as an int. Anytime we 
have a "This and That" class it's going to be an issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark 4.1: Set data file sort_order_id in manifest for writes from Spark [iceberg]

Reply via email to