Fokko commented on code in PR #2423:
URL: https://github.com/apache/iceberg-python/pull/2423#discussion_r2324224411
##########
mkdocs/docs/recipe-count.md:
##########
@@ -0,0 +1,114 @@
+---
+title: Count Recipe - Efficiently Count Rows in Iceberg Tables
+---
+
+# Counting Rows in an Iceberg Table
+
+This recipe demonstrates how to use the `count()` function to efficiently
count rows in an Iceberg table using PyIceberg. The count operation is
optimized for performance by reading file metadata rather than scanning actual
data.
+
+## How Count Works
+
+The `count()` method leverages Iceberg's metadata architecture to provide fast
row counts by:
+
+1. **Reading file manifests**: Examines metadata about data files without
loading the actual data
+2. **Aggregating record counts**: Sums up record counts stored in Parquet file
footers
+3. **Applying filters at metadata level**: Pushes down predicates to skip
irrelevant files
+4. **Handling deletes**: Automatically accounts for delete files and tombstones
+
+## Basic Usage
+
+Count all rows in a table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog("default")
+table = catalog.load_table("default.cities")
+
+# Get total row count
+row_count = table.scan().count()
+print(f"Total rows in table: {row_count}")
+```
+
+## Count with Filters
+
+Count rows matching specific conditions:
+
+```python
+from pyiceberg.expressions import GreaterThan, EqualTo, And
+
+# Count rows with population > 1,000,000
+large_cities = table.scan().filter(GreaterThan("population", 1000000)).count()
Review Comment:
I think using the SQL like expressions is easier to read:
```suggestion
# Count rows with population > 1,000,000
large_cities = table.scan().filter("population > 1000000").count()
```
##########
mkdocs/docs/recipe-count.md:
##########
@@ -0,0 +1,114 @@
+---
+title: Count Recipe - Efficiently Count Rows in Iceberg Tables
+---
+
+# Counting Rows in an Iceberg Table
+
+This recipe demonstrates how to use the `count()` function to efficiently
count rows in an Iceberg table using PyIceberg. The count operation is
optimized for performance by reading file metadata rather than scanning actual
data.
+
+## How Count Works
+
+The `count()` method leverages Iceberg's metadata architecture to provide fast
row counts by:
+
+1. **Reading file manifests**: Examines metadata about data files without
loading the actual data
+2. **Aggregating record counts**: Sums up record counts stored in Parquet file
footers
+3. **Applying filters at metadata level**: Pushes down predicates to skip
irrelevant files
+4. **Handling deletes**: Automatically accounts for delete files and tombstones
+
+## Basic Usage
+
+Count all rows in a table:
+
+```python
+from pyiceberg.catalog import load_catalog
+
+catalog = load_catalog("default")
+table = catalog.load_table("default.cities")
+
+# Get total row count
+row_count = table.scan().count()
+print(f"Total rows in table: {row_count}")
+```
+
+## Count with Filters
+
+Count rows matching specific conditions:
+
+```python
+from pyiceberg.expressions import GreaterThan, EqualTo, And
+
+# Count rows with population > 1,000,000
+large_cities = table.scan().filter(GreaterThan("population", 1000000)).count()
Review Comment:
I think using the SQL like expressions is easier to read:
```suggestion
large_cities = table.scan().filter("population > 1000000").count()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]