ForeverAngry commented on PR #2205:
URL: https://github.com/apache/iceberg-python/pull/2205#issuecomment-3322452187

   > thanks for your patience with the reviews!
   > 
   > im +1 to what jayce mentioned, i think we'd want to use the existing table 
retry mechanisms instead of adding retry to a specific user-facing table 
function.
   
   Yeah, @jayceslesar i think using the table properties to set or seed the 
retry arguments is fine.  That being said, i dont think that addresses the 
issue.  The problem here is a bit different, at least from my perspective.
   
   I do think retry logic for **this** function, alone, would actually make 
sense.  The `add_files` method can be used to commit large batches of files.  
The amount of overhead that goes into collecting that metadata from all of the 
files in said batch, just to get the point of trying to commit, is what the PR 
seeks to address.  There isnt a good or clean way to pull that metadata prior 
to calling the `add_files` method to store it _(besides using pyarrow 
directly),_ and if there were, i dont think there is a public api for 
`add_datafiles` in pyiceberg _(though i know the function is one of the private 
methods actually used in one of the functions in the call stack of the 
`add_files` function)._
   
   From my perspective, the way an `append` is used, in common data pipeline, 
is likely much different in scope and magnitude, as compared to what the 
`add_files` function could / typically sees.  @kevinjqliu does this context 
help, and was it persuasive, or is there maybe something im not seeing? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to