dramaticlly commented on PR #5373:
URL: https://github.com/apache/iceberg/pull/5373#issuecomment-1201876458

   > In the updated logic for the procedure, we will always use the delete func 
if it's specified.
   > 
   > If a delete function is not specified and the file io does not support 
bulk operations we would use the default delete function. Otherwise the bulk 
delete is used by default.
   > 
   > What this means is bulk operations will always be used (if the file io 
supports it) without considering user input. I was discussing with @dramaticlly 
perhaps we want more control over this and it should be specified in the action 
(something like useBulkDelete())?
   > 
   
   Thanks @amogh-jahagirdar , I was mostly thinking about this along the same 
line for #5412. In ideal case, we shall try to use Bulk operation if available 
to us (by inspecting the fileIO of given table, S3FileIO is the only supported 
one as of now ). On the other side, I think we need to maintain the backward 
compatibility, if some iceberg user already provide custom `deleteFunc` to the 
SparkAction in the past, everything shall work as expected.
   
   In my change, I added a new public method called `bulkDeleteFunc` to allow 
overrides if needed, this provides a way for customization even for tables on 
S3fileIO, or it can be used to disable the Bulk deletion for some 
troubleshooting and debugging needs. 
   
   
   > Also now we are delegating task management to the file IO, which I think 
makes sense but there's another argument that each procedure should control 
this since failure handling or retries would depend on the desired behavior for 
the procedure. What are peoples thoughts here?
   
   Personally I think delegate to fileIO for batching make sense to me, add 
additional parameter like batchSize in procedure call might cause confusion and 
hard to make it right. The downside I saw is about test hardness, wish for more 
inputs around it for test between classess. Current test class are all using 
Hadooptables and pick HiveTable with fileIO support SupportsBulkOperations seem 
to be overkill for unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to