aokolnychyi opened a new pull request #2945:
URL: https://github.com/apache/iceberg/pull/2945


   This PR adds new writer interfaces in `core` and an example of how they can 
be consumed in Spark 3. This will allow to write position deletes as well as 
write deltas in Spark. One of the major design changes is using composition 
over inheritance.
   
   ### Writer
   
   The first major proposed API is the `Writer` interface that defines a 
contract for writing a number of files of a single type within one 
spec/partition. Existing `DataWriter`, `EqualityDeleteWriter`, 
`PositionDeleteWriter` classes are the simplest implementations of that API.
   
   Then we have `RollingWriter` that implements `Writer` and wraps another 
writer to split the incoming records into multiple files within one 
spec/partition. We have `RollingDataWriter`, `RollingEqualityDeleteWriter`, 
`RollingPositionDeleteWriter` as actual implementations.
   
   ### PartitionAwareWriter
   
   All `Writer` implementations are limited to writing to a single 
spec/partition. To support writes to multiple specs and partitions, we have 
`PartitionAwareWriter`. In Iceberg, we support two types of writes: fanout and 
clustered. That’s why I am proposing to add `ClusteredWriter` and 
`FanoutWriter`. On one hand, `ClusteredWriter` will write to multiple specs and 
partitions ensuring the incoming data is properly clustered. On the other hand, 
`FanoutWriter` will keep a number of writers open and will not require a 
particular order of data. `ClusteredWriter` is very similar to our existing 
`PartitionedWriter` but it also detects changes in the spec, not only in 
partition values.
   
   ### V2TaskWriter
   
   This PR also introduces a new `TaskWriter` (I call it v2 but we better 
replace the existing API) and `DeltaTaskWriter` interfaces. They will be used 
by query engine integrations to write data from a single task. One notable 
difference compared to the existing code, I am using composition instead of 
inheritance and delegate to `TaskWriter` from query engine sinks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to