rdblue commented on a change in pull request #1261:
URL: https://github.com/apache/iceberg/pull/1261#discussion_r461904586
##########
File path: site/docs/spark.md
##########
@@ -520,6 +520,28 @@ data.writeTo("prod.db.table")
.createOrReplace()
```
+### Writing from streaming query (Structured Streaming)
+
+To write values from streaming query to Iceberg table, use `writeStream`:
+
+```scala
+data.writeStream
+ .format("iceberg")
+ .outputMode("append")
+ .option("path", pathToTable)
+ .option("checkpointLocation", checkpointPath)
+ .start()
Review comment:
This looks specific to 2.4. Should we have a 3.0 example and a separate
2.4 example like the other sections?
An alternative is to create a new page for Spark Streaming and add the docs
there. Then we could have a table like the one at the top of the Spark page
that explains what is supported in different versions.
##########
File path: site/docs/spark.md
##########
@@ -520,6 +520,28 @@ data.writeTo("prod.db.table")
.createOrReplace()
```
+### Writing from streaming query (Structured Streaming)
+
+To write values from streaming query to Iceberg table, use `writeStream`:
+
+```scala
+data.writeStream
+ .format("iceberg")
+ .outputMode("append")
+ .option("path", pathToTable)
+ .option("checkpointLocation", checkpointPath)
+ .start()
+```
+
+`append` and `complete` modes are supported. The table should be created in
prior to start the streaming query.
+
+!!! Note
+ To avoid metadata growing too huge, there're several guides you may want
to follow:
Review comment:
I think this is worth a section, not just a note.
> Streaming queries can create new table versions quickly, which creates
lots of table metadata to track those versions. Maintaining metadata by tuning
the rate of commits, expiring old snapshots, and automatically cleaning up
metadata files is highly recommended.
Then you could give an overview of those options and links to further docs,
like the table property docs for delete-after-commit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]