kbendick commented on a change in pull request #4121:
URL: https://github.com/apache/iceberg/pull/4121#discussion_r806194062
##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
```
{{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots
cannot be processed and will cause an exception. Similarly, delete snapshots
will cause an exception by default, but deletes may be ignored by setting
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or
`Replace`. Snapshots of type `Delete` & `Overwrite` cannot be processed and
will cause an exception. The spark options
`streaming-skip-delete-snapshots=true` &
`streaming-skip-overwrite-snapshots=true` may be used while reading from the
iceberg tables where, there are known cases of deletes and overwrites performed
on the table that can be ignored.
Review comment:
Nit: The wording is a little weird for me in the last sentence.
I'd maybe say "The spark options `streaming-skip-delete-snapshots` and
`streaming-skip-overwrite-snapshots` may be set to true to skip snapshots of
type `Delete` and `Overwrite`, respectively, in cases where it's acceptable to
ignore the table changes from those snapshots."
I'll leave that up to you though.
##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
```
{{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots
cannot be processed and will cause an exception. Similarly, delete snapshots
will cause an exception by default, but deletes may be ignored by setting
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or
`Replace`. Snapshots of type `Delete` & `Overwrite` cannot be processed and
will cause an exception. The spark options
`streaming-skip-delete-snapshots=true` &
`streaming-skip-overwrite-snapshots=true` may be used while reading from the
iceberg tables where, there are known cases of deletes and overwrites performed
on the table that can be ignored.
Review comment:
Also, maybe don't use `&` (though if use that elsewhere in the docs then
by all means ignore my comment 🙂 )
##########
File path: docs/versioned/spark/spark-structured-streaming.md
##########
@@ -44,7 +44,7 @@ val df = spark.readStream
```
{{< hint warning >}}
-Iceberg only supports reading data from append snapshots. Overwrite snapshots
cannot be processed and will cause an exception. Similarly, delete snapshots
will cause an exception by default, but deletes may be ignored by setting
`streaming-skip-delete-snapshots=true`.
+Iceberg only supports reading data from snapshots of type `Append` or
`Replace`. The snapshots of type `Delete` and `Overwrite` cannot be processed
and will cause an exception. The spark options
`streaming-skip-delete-snapshots` and `streaming-skip-overwrite-snapshots` may
be set to `true` to skip snapshots of type `Delete` and `Overwrite`,
respectively, in cases where it's acceptable to ignore the table changes from
those snapshots.
Review comment:
That's a fair point about `Replace` snapshots. Even as somebody working
on Iceberg, I wasn't able to point out what was really meant by that until
looking into it a bit more. It's definitely an implementation detail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]