aokolnychyi commented on issue #179: Use Iceberg tables as sources for Spark 
Structured Streaming
URL: 
https://github.com/apache/incubator-iceberg/issues/179#issuecomment-554666093
 
 
   I haven't heard from anyone for a while. I would like to pick this up myself 
unless someone has a prototype to share.
   
   Here are my thoughts:
   - When we start a stream, we should be able to stream out all currently 
present data in addition to what will arrive later.
   - As we might start a stream on a table with a lot of content, we should be 
able to split data into multiple batches either based on the number of 
files/records or based on batch size. Also, we need to take into account that 
files in the current snapshot could be added by already expired snapshots and 
we might not have metadata for those snapshots.
   - It should be possible to configure what operations are possible on the 
table while we are streaming out. We can have a config per operation or a list 
of allowed operations.
     - If the operation is not supported, then throw an exception.
     - If `add`, then stream the added data out (allowed by default).
     - If `rewrite`, then ignore such snapshots (allowed by default).
     - If `delete`, then ignore such snapshots (NOT allowed by default). 
Sometimes, we can delete data without impacting the correctness of a stream.
     - If `overwrite`, then stream the data out (NOT allowed by default). 
Consumers will have to deal with duplicates as we use this operation for eager 
updates.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to