[GitHub] [iceberg] kbendick commented on a change in pull request #4121: add Spark structured streaming option skip-overwrites to docs

GitBox Tue, 15 Feb 2022 10:40:08 -0800


kbendick commented on a change in pull request #4121:
URL: https://github.com/apache/iceberg/pull/4121#discussion_r806194062




##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
 ```
 
 {{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots 
cannot be processed and will cause an exception. Similarly, delete snapshots 
will cause an exception by default, but deletes may be ignored by setting 
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or 
`Replace`. Snapshots of type `Delete` & `Overwrite` cannot be processed and 
will cause an exception. The spark options 
`streaming-skip-delete-snapshots=true` & 
`streaming-skip-overwrite-snapshots=true` may be used while reading from the 
iceberg tables where, there are known cases of deletes and overwrites performed 
on the table that can be ignored.

Review comment:
       Nit: The wording is a little weird for me in the last sentence.
   
   I'd maybe say "The spark options `streaming-skip-delete-snapshots` and 
`streaming-skip-overwrite-snapshots` may be set to true to skip snapshots of 
type `Delete` and `Overwrite`, respectively, in cases where it's acceptable to 
ignore the table changes from those snapshots."
   
   I'll leave that up to you though.

##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
 ```
 
 {{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots 
cannot be processed and will cause an exception. Similarly, delete snapshots 
will cause an exception by default, but deletes may be ignored by setting 
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or 
`Replace`. Snapshots of type `Delete` & `Overwrite` cannot be processed and 
will cause an exception. The spark options 
`streaming-skip-delete-snapshots=true` & 
`streaming-skip-overwrite-snapshots=true` may be used while reading from the 
iceberg tables where, there are known cases of deletes and overwrites performed 
on the table that can be ignored.

Review comment:
       Also, maybe don't use `&` (though if use that elsewhere in the docs then 
by all means ignore my comment 🙂  )

##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
 ```
 
 {{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots 
cannot be processed and will cause an exception. Similarly, delete snapshots 
will cause an exception by default, but deletes may be ignored by setting 
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or 
`Replace`. The snapshots of type `Delete` and `Overwrite` cannot be processed 
and will cause an exception. The spark options 
`streaming-skip-delete-snapshots` and `streaming-skip-overwrite-snapshots` may 
be set to `true` to skip snapshots of type `Delete` and `Overwrite`, 
respectively, in cases where it's acceptable to ignore the table changes from 
those snapshots.

Review comment:
       That's a fair point about `Replace` snapshots. Even as somebody working 
on Iceberg, I wasn't able to point out what was really meant by that until 
looking into it a bit more. It's definitely an implementation detail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #4121: add Spark structured streaming option skip-overwrites to docs

Reply via email to