Re: [PR] Added test for count() method and documentation for count() [iceberg-python]

via GitHub Thu, 04 Sep 2025 23:31:25 -0700


Fokko commented on code in PR #2423:
URL: https://github.com/apache/iceberg-python/pull/2423#discussion_r2324224411



##########
mkdocs/docs/recipe-count.md:
##########
@@ -0,0 +1,114 @@
+---
+title: Count Recipe - Efficiently Count Rows in Iceberg Tables
+---
+
+# Counting Rows in an Iceberg Table
+
+This recipe demonstrates how to use the `count()` function to efficiently 
count rows in an Iceberg table using PyIceberg. The count operation is 
optimized for performance by reading file metadata rather than scanning actual 
data.
+
+## How Count Works
+
+The `count()` method leverages Iceberg's metadata architecture to provide fast 
row counts by:
+
+1. **Reading file manifests**: Examines metadata about data files without 
loading the actual data
+2. **Aggregating record counts**: Sums up record counts stored in Parquet file 
footers
+3. **Applying filters at metadata level**: Pushes down predicates to skip 
irrelevant files
+4. **Handling deletes**: Automatically accounts for delete files and tombstones
+
+## Basic Usage
+
+Count all rows in a table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog("default")
+table = catalog.load_table("default.cities")
+
+# Get total row count
+row_count = table.scan().count()
+print(f"Total rows in table: {row_count}")
+```
+
+## Count with Filters
+
+Count rows matching specific conditions:
+
+```python
+from pyiceberg.expressions import GreaterThan, EqualTo, And
+
+# Count rows with population > 1,000,000
+large_cities = table.scan().filter(GreaterThan("population", 1000000)).count()

Review Comment:
   I think using the SQL like expressions is easier to read:
   ```suggestion
   # Count rows with population > 1,000,000
   large_cities = table.scan().filter("population > 1000000").count()
   ```



##########
mkdocs/docs/recipe-count.md:
##########
@@ -0,0 +1,114 @@
+---
+title: Count Recipe - Efficiently Count Rows in Iceberg Tables
+---
+
+# Counting Rows in an Iceberg Table
+
+This recipe demonstrates how to use the `count()` function to efficiently 
count rows in an Iceberg table using PyIceberg. The count operation is 
optimized for performance by reading file metadata rather than scanning actual 
data.
+
+## How Count Works
+
+The `count()` method leverages Iceberg's metadata architecture to provide fast 
row counts by:
+
+1. **Reading file manifests**: Examines metadata about data files without 
loading the actual data
+2. **Aggregating record counts**: Sums up record counts stored in Parquet file 
footers
+3. **Applying filters at metadata level**: Pushes down predicates to skip 
irrelevant files
+4. **Handling deletes**: Automatically accounts for delete files and tombstones
+
+## Basic Usage
+
+Count all rows in a table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog("default")
+table = catalog.load_table("default.cities")
+
+# Get total row count
+row_count = table.scan().count()
+print(f"Total rows in table: {row_count}")
+```
+
+## Count with Filters
+
+Count rows matching specific conditions:
+
+```python
+from pyiceberg.expressions import GreaterThan, EqualTo, And
+
+# Count rows with population > 1,000,000
+large_cities = table.scan().filter(GreaterThan("population", 1000000)).count()

Review Comment:
   I think using the SQL like expressions is easier to read:
   ```suggestion
   large_cities = table.scan().filter("population > 1000000").count()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Added test for count() method and documentation for count() [iceberg-python]

Reply via email to