[GitHub] [iceberg] karim-ramadan opened a new pull request, #7295: Spark3 structured streaming enable updates

via GitHub Fri, 07 Apr 2023 07:04:03 -0700


karim-ramadan opened a new pull request, #7295:
URL: https://github.com/apache/iceberg/pull/7295


   ### Context
   
   As brought up in issue #2788, the only 2 possible actions if reading an 
iceberg table as a Spark streaming DataFrame are either to skip it or fail. A 
third possible option would be to consider only added files and ignore deleted 
files.
   
   ### Proposal 
   
   In this PR I propose a new spark reading option: 
   `streaming-overwrite-snapshots-read-mode` 
   with three possible values: SKIP, BREAK, ADDED_FILES_ONLY
   to substitute the already existing 
   `streaming-skip-overwrite-snapshots` (true|false)
   
   The new ADDED_FILES_ONLY would consider just adding files.
   
   ### Notes
   
   - The old conf streaming-skip-overwrite-snapshots have been maintained and 
used to integrate with the new one (the new one has higher precedence)
   - Some fixes to unit tests have been applied to make them work on Windows I 
could revert those changes and address them in another PR if needed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] karim-ramadan opened a new pull request, #7295: Spark3 structured streaming enable updates

Reply via email to