szehon-ho commented on PR #5373:
URL: https://github.com/apache/iceberg/pull/5373#issuecomment-1203268307

   > I'm not sure about throwing an exception if a user specify delete function 
and if the file io supports bulk delete is the way to go, because then we're 
changing the behavior of the exposed deleteFunc API. I think if deleteFunc is 
set, the procedure continues to use that as a source of truth regardless of 
bulk delete support or not. If we throw an exception, that would mean user's 
code would need to get rewritten if it's using S3FileIO and running this 
procedure with custom delete. Let me know if i'm misunderstanding!
   
   > That being said, now we are changing the behavior if they do not specify a 
delete func and if it supports bulk delete. This change is less intrusive 
because it changes internally how the procedure runs and is not really exposed 
to a user. Let me know what you think.
   
   Maybe it was a bit of a misunderstanding, I was talking about the flag you 
are discussing with @dramaticlly (useBulkDelete).  So I was thinking, if 
useBulkDelete is on && deleteFunc is set, then its a misconfiguration.
   
   But are we doing the flag?  Or are you suggesting, is to have deleteFunc 
always take precedence, ie if (deleteFunc set), always use the single file 
deleteFunc.   Otherwise, if FileIO supports bulkOperations, automatically use 
the bulk delete?
   
   
   > Also now we are delegating task management to the file IO, which I think 
makes sense but there's another argument that each procedure should control 
this since failure handling or retries would depend on the desired behavior for 
the procedure. What are peoples thoughts here? @dramaticlly @aokolnychyi 
@RussellSpitzer @karuppayya
   
   Whats the choice here, I suppose we will have to have an extra parameter on 
supportBulkOperations FileIOs to control retry, and this can be set by the 
various procedures?  I think after #5379 it will be easy to implement , as we 
can just set that parameter on the Tasks?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to