rdblue commented on a change in pull request #3749:
URL: https://github.com/apache/iceberg/pull/3749#discussion_r773430186
##########
File path: site/docs/spark-structured-streaming.md
##########
@@ -26,6 +26,25 @@ As of Spark 3.0, DataFrame reads and writes are supported.
|--------------------------------------------------|----------|------------|------------------------------------------------|
| [DataFrame write](#writing-with-streaming-query) | ✔ | ✔ |
|
+## Streaming Reads
+
+Iceberg supports processing incremental data in spark structured streaming
jobs which starts from a historical timestamp:
+
+```scala
+val spark:SparkSession = ...
+val tableIdentifier: String = ...
+
+val df = spark.readStream
+ .format("iceberg")
+ .option(SparkReadOptions.STREAM_FROM_TIMESTAMP,
Long.toString(streamStartTimestamp))
+ .load(tableIdentifier)
+```
+
+The `tableIdentifier` can be any valid table identifier or table path. Refer
[TableIdentifier](https://iceberg.apache.org/javadoc/0.12.1/org/apache/iceberg/catalog/TableIdentifier.html)
+
+!!! Note
+ Iceberg only supports read data from snapshot whose type of Data
Operations is APPEND\REPLACE\DELETE. In particular if some of your snapshots
are of DELETE type, you need to add 'streaming-skip-delete-snapshots' option to
skip it, otherwise the task will fail.
Review comment:
There are a few issues with this paragraph:
* Typo: "only supports reading"
* Typo: "from snapshots"
* Change "snapshots whose type of Data Operations ..." to "append snapshots"
because it is much shorter
* As a separate sentence, add that delete and overwrite cannot be processed
and will cause an exception
* Then in the last sentence add that deletes can be ignored: "To ignore
delete snapshots, add `streaming-skip-delete-snapshots=true`"
Keep in mind that people reading the documentation probably don't know
Iceberg internals. Referring to "Data Operations" is not very clear to most
readers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]