cshuo commented on code in PR #18867:
URL: https://github.com/apache/hudi/pull/18867#discussion_r3315522081
##########
website/docs/reading_tables_batch_reads.md:
##########
@@ -19,6 +19,42 @@ val tripsDF = spark.read.
tripsDF.where(tripsDF.fare > 20.0).show()
```
+## Flink Batch (Snapshot) Read
+
+Flink can read a Hudi table as a snapshot (batch) query by leaving
`read.streaming.enabled` at its default value of `false`.
+
+```sql
+CREATE TABLE hudi_table (
+ uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
+ name VARCHAR(10),
+ age INT,
+ ts TIMESTAMP(3),
+ `partition` VARCHAR(20)
+)
+PARTITIONED BY (`partition`)
+WITH (
+ 'connector' = 'hudi',
+ 'path' = '${path}',
+ 'table.type' = 'MERGE_ON_READ'
+ -- read.streaming.enabled defaults to false → batch/snapshot read
+);
+
+-- Snapshot query
+SELECT * FROM hudi_table WHERE age > 25;
+```
+
+### LIMIT Push-Down with Source V2
+
+When [Source V2](ingestion_flink.md#flink-source-v2) is enabled
(`read.source-v2.enabled=true`), `LIMIT` clauses are pushed down to the source,
reducing the number of files scanned:
+
+```sql
+SELECT * FROM hudi_table LIMIT 100;
+```
+
+Without Source V2, the `LIMIT` is applied after all data is read from storage.
With Source V2 it is pushed to the split enumeration layer, stopping file
scanning early.
Review Comment:
the Source V2 `LIMIT` description seems overstated. The implementation
enforces the limit in the source reader via `RecordLimiter`, not in split
enumeration. Besides, the legacy source also supports limit pushdown.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]