[GitHub] [iceberg] GrigorievNick edited a comment on issue #1202: [Question] Do Spark Iceberg module implement Copy On Write Delete and Update?

GitBox Mon, 20 Jul 2020 00:40:53 -0700


GrigorievNick edited a comment on issue #1202:
URL: https://github.com/apache/iceberg/issues/1202#issuecomment-659308492



   > Both Spark 2.4 and Spark 3.0 support dynamic partition overwrite. Spark 
3.0 also supports overwrite by expression, although the expression must match 
all rows in a data file or no rows of a data file, or else it will cause an 
exception because the granularity of delete is a whole data file.
   
   @rdblue 
   But `Overwrite` that implemented in delete is much smarter then overwrite 
all data in the partition. 
   it will change only files that contain changes, while simple overwrite will 
update all partition.
   So of course I can read data all data from partition -> manipulate -> 
overwrite. 
   But I can do it with any code. What I am looking for is to update only files 
that match changes.
   So as I understand, there is no such solution right now, yes?
   
   I can implement it manually using low-level(java-core) API.
   But in this case, I have one more question, which I can't find in docs. 
   Does it possible to do concurrent [Table 
Operation](https://iceberg.apache.org/api/#table-metadata) -> `newRewrite`?
    Small explanation: I will have different spark partitions that will 
overwrite one or a few dataFiles.
   And of course, a partition is idempotent and running in parallel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] GrigorievNick edited a comment on issue #1202: [Question] Do Spark Iceberg module implement Copy On Write Delete and Update?

Reply via email to