luoyuxia commented on code in PR #300:
URL: https://github.com/apache/fluss-rust/pull/300#discussion_r2806734522
##########
website/docs/user-guide/python/installation.md:
##########
@@ -0,0 +1,41 @@
+---
+sidebar_position: 1
+---
+# Installation
+
+```bash
+pip install pyfluss
+```
+
+To build from source instead:
Review Comment:
Make it as a title to higlight the following context is to build from source
instead.
##########
website/docs/user-guide/cpp/example/configuration.md:
##########
@@ -0,0 +1,35 @@
+---
+sidebar_position: 2
+---
+# Configuration
Review Comment:
Maybe some thing like:
# Fluss Connection
## Connection Setup
## Configuration options to Setup Connection
?
The Configuration is just a class to create connection, maybe not suitable
as the main title.
WDYT?
##########
website/docs/user-guide/cpp/example/configuration.md:
##########
@@ -0,0 +1,35 @@
+---
+sidebar_position: 2
+---
+# Configuration
Review Comment:
And same for other language clients.
##########
website/docs/user-guide/python/example/log-tables.md:
##########
@@ -0,0 +1,122 @@
+---
+sidebar_position: 4
+---
+# Log Tables
+
+Log tables are append-only tables without primary keys, suitable for event
streaming.
+
+## Creating a Log Table
+
+```python
+import pyarrow as pa
+
+schema = fluss.Schema(pa.schema([
+ pa.field("id", pa.int32()),
+ pa.field("name", pa.string()),
+ pa.field("score", pa.float32()),
+]))
+
+table_path = fluss.TablePath("fluss", "events")
+await admin.create_table(table_path, fluss.TableDescriptor(schema),
ignore_if_exists=True)
+```
+
+## Writing
+
+Rows can be appended as dicts, lists, or tuples. For bulk writes, use
`write_arrow()`, `write_arrow_batch()`, or `write_pandas()`.
+
+Write methods like `append()` and `write_arrow_batch()` return a
`WriteResultHandle`. You can ignore it for fire-and-forget semantics (flush at
the end), or `await handle.wait()` to block until the server acknowledges that
specific write.
+
+```python
+table = await conn.get_table(table_path)
+writer = table.new_append().create_writer()
+
+# Fire-and-forget: queue writes, flush at the end
+writer.append({"id": 1, "name": "Alice", "score": 95.5})
+writer.append([2, "Bob", 87.0])
+await writer.flush()
+
+# Per-record acknowledgment
+handle = writer.append({"id": 3, "name": "Charlie", "score": 91.0})
+await handle.wait()
+
+# Bulk writes
+writer.write_arrow(pa_table) # PyArrow Table
+writer.write_arrow_batch(record_batch) # PyArrow RecordBatch
+writer.write_pandas(df) # Pandas DataFrame
+await writer.flush()
+```
+
+## Reading
+
+There are two scanner types:
+- **Batch scanner** (`create_record_batch_log_scanner()`): returns Arrow
Tables or DataFrames, best for analytics
+- **Record scanner** (`create_log_scanner()`): returns individual records with
metadata (offset, timestamp, change type), best for streaming
+
+And two reading modes:
+- **`to_arrow()` / `to_pandas()`**: reads all data from subscribed buckets up
to the current latest offset, then returns. Best for one-shot batch reads.
+- **`poll_arrow()` / `poll()` / `poll_record_batch()`**: returns whatever data
is available within the timeout, then returns. Call in a loop for continuous
streaming.
+
+### Batch Read (One-Shot)
+
+```python
+num_buckets = (await admin.get_table_info(table_path)).num_buckets
+
+scanner = await table.new_scan().create_record_batch_log_scanner()
+scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in
range(num_buckets)})
+
+# Reads everything up to current latest offset, then returns
+arrow_table = scanner.to_arrow()
+df = scanner.to_pandas()
+```
+
+### Continuous Polling
+
+Use `poll_arrow()` or `poll()` in a loop for streaming consumption:
+
+```python
+# Batch scanner: poll as Arrow Tables
+scanner = await table.new_scan().create_record_batch_log_scanner()
+scanner.subscribe(bucket_id=0, start_offset=fluss.EARLIEST_OFFSET)
+
+while True:
+ result = scanner.poll_arrow(timeout_ms=5000)
+ if result.num_rows > 0:
+ print(result.to_pandas())
+
+# Record scanner: poll individual records with metadata
+scanner = await table.new_scan().create_log_scanner()
+scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in
range(num_buckets)})
+
+while True:
+ for record in scanner.poll(timeout_ms=5000):
+ print(f"offset={record.offset},
change={record.change_type.short_string()}, row={record.row}")
+```
+
+### Unsubscribing
+
+To stop consuming from a bucket, use `unsubscribe()`:
+
+```python
+scanner.unsubscribe(bucket_id=0)
+```
+
+### Subscribe from Latest Offset
Review Comment:
Seems it's not valid. Right? We can remove it.
##########
website/docs/index.md:
##########
@@ -0,0 +1,33 @@
+---
+slug: /
+sidebar_position: 1
+title: Introduction
+---
+
+# Introduction
+
+[Apache Fluss](https://fluss.apache.org/) (incubating) is a streaming storage
system built for real-time analytics, serving as the real-time data layer for
Lakehouse architectures.
+
+This documentation covers the **Fluss client libraries** for Rust, Python, and
C++, which are developed in the
[fluss-rust](https://github.com/apache/fluss-rust) repository. These clients
allow you to:
+
+- **Create and manage** databases, tables, and partitions
+- **Write** data to log tables (append-only) and primary key tables
(upsert/delete)
+- **Read** data via log scanning and key lookups
+- **Integrate** with the broader Fluss ecosystem including lakehouse snapshots
+
+## Client Overview
+
+| | Rust
| Python | C++
|
+|------------------------|------------------------------------------------------------|--------------------------|------------------------------------------------|
+| **Package** | [fluss-rs](https://crates.io/crates/fluss-rs) on
crates.io | Build from source (PyO3) | Build from source (CMake)
|
+| **Async runtime** | Tokio
| asyncio | Synchronous (Tokio runtime managed
internally) |
+| **Data format** | Arrow RecordBatch / GenericRow
| PyArrow / Pandas / dict | Arrow RecordBatch / GenericRow
|
+| **Log tables** | Read + Write
| Read + Write | Read + Write
|
+| **Primary key tables** | Upsert + Delete + Lookup
| Upsert + Delete + Lookup | Upsert + Delete + Lookup
|
+| **Partitioned tables** | Full support
| Write support | Full support
|
Review Comment:
IIUC, python should also support read for partitioned table?
##########
website/docs/user-guide/python/example/log-tables.md:
##########
@@ -0,0 +1,122 @@
+---
+sidebar_position: 4
+---
+# Log Tables
+
+Log tables are append-only tables without primary keys, suitable for event
streaming.
+
+## Creating a Log Table
+
+```python
+import pyarrow as pa
+
+schema = fluss.Schema(pa.schema([
+ pa.field("id", pa.int32()),
+ pa.field("name", pa.string()),
+ pa.field("score", pa.float32()),
+]))
+
+table_path = fluss.TablePath("fluss", "events")
+await admin.create_table(table_path, fluss.TableDescriptor(schema),
ignore_if_exists=True)
+```
+
+## Writing
+
+Rows can be appended as dicts, lists, or tuples. For bulk writes, use
`write_arrow()`, `write_arrow_batch()`, or `write_pandas()`.
+
+Write methods like `append()` and `write_arrow_batch()` return a
`WriteResultHandle`. You can ignore it for fire-and-forget semantics (flush at
the end), or `await handle.wait()` to block until the server acknowledges that
specific write.
+
+```python
+table = await conn.get_table(table_path)
+writer = table.new_append().create_writer()
+
+# Fire-and-forget: queue writes, flush at the end
+writer.append({"id": 1, "name": "Alice", "score": 95.5})
+writer.append([2, "Bob", 87.0])
+await writer.flush()
+
+# Per-record acknowledgment
+handle = writer.append({"id": 3, "name": "Charlie", "score": 91.0})
+await handle.wait()
+
+# Bulk writes
+writer.write_arrow(pa_table) # PyArrow Table
+writer.write_arrow_batch(record_batch) # PyArrow RecordBatch
+writer.write_pandas(df) # Pandas DataFrame
+await writer.flush()
+```
+
+## Reading
+
+There are two scanner types:
+- **Batch scanner** (`create_record_batch_log_scanner()`): returns Arrow
Tables or DataFrames, best for analytics
+- **Record scanner** (`create_log_scanner()`): returns individual records with
metadata (offset, timestamp, change type), best for streaming
+
+And two reading modes:
+- **`to_arrow()` / `to_pandas()`**: reads all data from subscribed buckets up
to the current latest offset, then returns. Best for one-shot batch reads.
+- **`poll_arrow()` / `poll()` / `poll_record_batch()`**: returns whatever data
is available within the timeout, then returns. Call in a loop for continuous
streaming.
+
+### Batch Read (One-Shot)
+
+```python
+num_buckets = (await admin.get_table_info(table_path)).num_buckets
+
+scanner = await table.new_scan().create_record_batch_log_scanner()
+scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in
range(num_buckets)})
+
+# Reads everything up to current latest offset, then returns
+arrow_table = scanner.to_arrow()
+df = scanner.to_pandas()
+```
+
+### Continuous Polling
+
+Use `poll_arrow()` or `poll()` in a loop for streaming consumption:
+
+```python
+# Batch scanner: poll as Arrow Tables
+scanner = await table.new_scan().create_record_batch_log_scanner()
+scanner.subscribe(bucket_id=0, start_offset=fluss.EARLIEST_OFFSET)
+
+while True:
+ result = scanner.poll_arrow(timeout_ms=5000)
+ if result.num_rows > 0:
+ print(result.to_pandas())
+
+# Record scanner: poll individual records with metadata
+scanner = await table.new_scan().create_log_scanner()
+scanner.subscribe_buckets({i: fluss.EARLIEST_OFFSET for i in
range(num_buckets)})
+
+while True:
+ for record in scanner.poll(timeout_ms=5000):
+ print(f"offset={record.offset},
change={record.change_type.short_string()}, row={record.row}")
+```
+
+### Unsubscribing
+
+To stop consuming from a bucket, use `unsubscribe()`:
+
+```python
+scanner.unsubscribe(bucket_id=0)
+```
+
+### Subscribe from Latest Offset
Review Comment:
The same for other clients
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]