[GitHub] [iceberg] rdblue commented on a change in pull request #3796: Docs: update spark doc about incremental scan

GitBox Mon, 03 Jan 2022 10:02:30 -0800


rdblue commented on a change in pull request #3796:
URL: https://github.com/apache/iceberg/pull/3796#discussion_r777625163




##########
File path: site/docs/spark-queries.md
##########
@@ -104,6 +104,28 @@ spark.read
 
 Time travel is not yet supported by Spark's SQL syntax.
 
+### Incremental read
+
+To read incremental data between the snapshots, Configure below Spark read 
options:
+
+* `start-snapshot-id` Start snapshot ID used in incremental scans (exclusive)
+* `end-snapshot-id` End snapshot ID used in incremental scans (inclusive)
+
+```scala
+// get the data added after start-snapshot-id (10963874102873L) till 
end-snapshot-id (63874143573109L)
+spark.read()
+  .format("iceberg")
+  .option("start-snapshot-id", "10963874102873")
+  .option("end-snapshot-id", "63874143573109")
+  .load("path/to/table")
+```
+
+!!! Note
+Currently gets only the data from `append` operation. Cannot support 
`replace`, `overwrite`, `delete` operations yet.
+Works with both V1 and V2 format-version.

Review comment:
       Is this part of the note or a separate paragraph? Also, could you expand 
this to be a complete sentence?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3796: Docs: update spark doc about incremental scan

Reply via email to