gabeiglio commented on code in PR #2423: URL: https://github.com/apache/iceberg-python/pull/2423#discussion_r2319791508
########## tests/table/test_count.py: ########## @@ -0,0 +1,129 @@ +""" +Unit tests for the DataScan.count() method in PyIceberg. + +The count() method is essential for determining the number of rows in an Iceberg table +without having to load the actual data. It works by examining file metadata and task +plans to efficiently calculate row counts across distributed data files. + +These tests validate the count functionality across different scenarios: +1. Basic counting with single file tasks +2. Empty table handling (zero records) +3. Large-scale counting with multiple file tasks + +The tests use mocking to simulate different table states without requiring actual +Iceberg table infrastructure, ensuring fast and isolated unit tests. +""" + +import pytest +from unittest.mock import MagicMock, Mock, patch +from pyiceberg.table import DataScan +from pyiceberg.expressions import AlwaysTrue + + +class DummyFile: Review Comment: I think we could write real data files and use that for testing wdyt? Here are some fixtures we could use to get a `FileScanTask` with a file with some rows in it: [example](https://github.com/apache/iceberg-python/blob/52d810efb62e39ec6d8d6a2f4cd2cad8165e2d2c/tests/conftest.py#L2408) Maybe we can also add some more fixtures to get FileScanTasks for empty files and large ones -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
