(paimon) branch master updated: [Python] Update Doc for Read Splits and Data Types (#6254)

lzljs3620320 Sun, 14 Sep 2025 21:17:23 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new bfa48045ef [Python] Update Doc for Read Splits and Data Types (#6254)
bfa48045ef is described below

commit bfa48045ef60ef922a673b292464d25506ace276
Author: ChengHui Chen <27797326+chenghuic...@users.noreply.github.com>
AuthorDate: Mon Sep 15 11:59:50 2025 +0800

    [Python] Update Doc for Read Splits and Data Types (#6254)
---
 docs/content/program-api/python-api.md | 53 ++++++++++++++++++++++------------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/docs/content/program-api/python-api.md 
b/docs/content/program-api/python-api.md
index ab4967a895..a5ab249cc5 100644
--- a/docs/content/program-api/python-api.md
+++ b/docs/content/program-api/python-api.md
@@ -25,9 +25,7 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Java-based Implementation For Python API
-
-[Python SDK ](https://github.com/apache/paimon-python) has defined Python API 
for Paimon.
+# Python API
 
 ## Environment Settings
 
@@ -65,7 +63,7 @@ Table is located in a database. If you want to create table 
in a new database, y
 ```python
 catalog.create_database(
     name='database_name',
-    ignore_if_exists=True,  # If you want to raise error if the database 
exists, set False
+    ignore_if_exists=True,  # To raise error if the database exists, set False
     properties={'key': 'value'}  # optional database properties
 )
 ```
@@ -138,7 +136,7 @@ schema = ...
 catalog.create_table(
     identifier='database_name.table_name',
     schema=schema,
-    ignore_if_exists=True  # If you want to raise error if the table exists, 
set False
+    ignore_if_exists=True  # To raise error if the table exists, set False
 )
 ```
 
@@ -193,10 +191,10 @@ API:
 
 ```python
 # overwrite whole table
-write_builder.overwrite()
+write_builder = table.new_batch_write_builder().overwrite()
 
 # overwrite partition 'dt=2024-01-01'
-write_builder.overwrite({'dt': '2024-01-01'})
+write_builder = table.new_batch_write_builder().overwrite({'dt': '2024-01-01'})
 ```
 
 ## Batch Read
@@ -272,7 +270,7 @@ You can also read data into a `pyarrow.RecordBatchReader` 
and iterate record bat
 
 ```python
 table_read = read_builder.new_read()
-for batch in table_read.to_iterator(splits):
+for batch in table_read.to_arrow_batch_reader(splits):
     print(batch)
 
 # pyarrow.RecordBatch
@@ -283,6 +281,19 @@ for batch in table_read.to_iterator(splits):
 # f1: ["a","b","c"]
 ```
 
+#### Python Iterator
+You can read the data row by row into a native Python iterator. 
+This is convenient for custom row-based processing logic.
+
+```python
+table_read = read_builder.new_read()
+for row in table_read.to_iterator(splits):
+    print(row)
+
+# [1,2,3]
+# ["a","b","c"]
+```
+
 #### Pandas
 
 This requires `pandas` to be installed.
@@ -351,16 +362,22 @@ print(ray_dataset.to_pandas())
 ```
 
 ## Data Types
-
-| pyarrow                                                          | Paimon   
| 
-|:-----------------------------------------------------------------|:---------|
-| pyarrow.int8()                                                   | TINYINT  |
-| pyarrow.int16()                                                  | SMALLINT |
-| pyarrow.int32()                                                  | INT      |
-| pyarrow.int64()                                                  | BIGINT   |
-| pyarrow.float16() <br/>pyarrow.float32()  <br/>pyarrow.float64() | FLOAT    |
-| pyarrow.string()                                                 | STRING   |
-| pyarrow.boolean()                                                | BOOLEAN  |
+| Python Native Type | PyArrow Type | Paimon Type |
+| :--- | :--- | :--- |
+| `int` | `pyarrow.int8()` | `TINYINT` |
+| `int` | `pyarrow.int16()` | `SMALLINT` |
+| `int` | `pyarrow.int32()` | `INT` |
+| `int` | `pyarrow.int64()` | `BIGINT` |
+| `float` | `pyarrow.float32()` | `FLOAT` |
+| `float` | `pyarrow.float64()` | `DOUBLE` |
+| `bool` | `pyarrow.bool_()` | `BOOLEAN` |
+| `str` | `pyarrow.string()` | `STRING`, `CHAR(n)`, `VARCHAR(n)` |
+| `bytes` | `pyarrow.binary()` | `BYTES`, `VARBINARY(n)` |
+| `bytes` | `pyarrow.binary(length)` | `BINARY(length)` |
+| `decimal.Decimal` | `pyarrow.decimal128(precision, scale)` | 
`DECIMAL(precision, scale)` |
+| `datetime.datetime` | `pyarrow.timestamp(unit, tz=None)` | `TIMESTAMP(p)` |
+| `datetime.date` | `pyarrow.date32()` | `DATE` |
+| `datetime.time` | `pyarrow.time32(unit)` or `pyarrow.time64(unit)` | 
`TIME(p)` |
 
 ## Predicate

(paimon) branch master updated: [Python] Update Doc for Read Splits and Data Types (#6254)

Reply via email to