erikcw opened a new issue, #6568:
URL: https://github.com/apache/iceberg/issues/6568
### Apache Iceberg version
1.1.0 (latest release)
### Query engine
Other
### Please describe the bug 🐞
I originally mentioned raised this issue in #6567. After deleting rows from
a table (in my case with Athena), pyiceberg is still returning parquet files
with those records from a table scan. Shouldn't those files no longer be in the
current manifest and hence not returned my the table scan?
```sql
-- Executed in Athena
DELETE FROM iceberg_test WHERE uid = '200441';
select count(uid) from "iceberg_test"
where uid = '200441';
-- Returns 0.
```
```py
# Glue catalog type.
catalog = load_catalog("default")
table = catalog.load_table("testing.iceberg_test")
scan = table.scan(
row_filter=NotEqualTo("uid", "200441"), # Doesn't seem to make a
difference with or without this line.
selected_fields=("uid"),
)
files = [task.file.file_path for task in scan.plan_files()]
# files all contain the deleted value.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]