rdblue commented on issue #351: Provide an API to modify records within files
URL: https://github.com/apache/incubator-iceberg/pull/351#issuecomment-518439477
 
 
   My high-level feedback is that I'd prefer to reuse `OverwriteFiles` and 
update it to expose the behavior required by this. If the required behavior is 
to fail if any file is added that matches the delete filter, then we can add a 
flag to set that like `validateAppendOnly` in `ReplacePartitions`. How about 
`validateNoConflictingAppends`?
   
   I think we want to make both behaviors available, but we should also 
consider making failure when a conflict is detected the default. 
`OverwriteFiles` currently implements an idempotent change: replace all data 
matching a filer with new data. The intent is for cases like overwriting an 
aggregation: you update the aggregation every hour and always produce a 
completely new copy independent of the data in the table. But, that use case is 
unlikely to run into a problem if `validateNoConflictingAppends` were the 
default. Instead of two concurrent runs both succeeding, one would fail.
   
   If we want to fail if there are conflicting appends by default, then we 
could add `allowConflictingAppends` instead. We would also want to decide 
whether the current overwrite behavior in Spark should allow or not allow 
conflicting appends. I think if the default is to not allow them, then we 
should go with that. We can add a write option to allow conflicting appends, 
like "is-idempotent".
   
   @aokolnychyi, what do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to