rambleraptor commented on PR #3320:
URL: https://github.com/apache/iceberg-python/pull/3320#issuecomment-4675718202
Alright, thanks for hanging on. There's a lot of complicated logic here I'm
trying to get through.
_resolve_parent_snapshot only looks at the committed metadata, not the
commits that need to happen. That means you get into this very fun situation if
you have mixed deletes.
Here's the test I wrote up:
```
def test_mixed_delete_overwrite_retries_successfully(catalog: Catalog) ->
None:
"""A mixed full-file + partial delete should succeed via retry, not
raise ValidationException."""
from pyiceberg.partitioning import PartitionField, PartitionSpec
from pyiceberg.transforms import IdentityTransform
catalog.create_namespace("default")
schema = Schema(
NestedField(1, "category", StringType(), required=False),
NestedField(2, "value", LongType(), required=False),
)
spec = PartitionSpec(PartitionField(source_id=1, field_id=1000,
transform=IdentityTransform(), name="category"))
catalog.create_table("default.mixed_retry_test", schema=schema,
partition_spec=spec)
import pyarrow as pa
tbl = catalog.load_table("default.mixed_retry_test")
# 3 partitions, one data file each: a→[1,2], b→[3,4], c→[5,6]
tbl.append(pa.table({"category": ["a", "a", "b", "b", "c", "c"],
"value": [1, 2, 3, 4, 5, 6]}))
tbl1 = catalog.load_table("default.mixed_retry_test")
tbl2 = catalog.load_table("default.mixed_retry_test")
tbl1.append(pa.table({"category": ["c"], "value": [7]}))
# This is your problem.
# This is in multiple partitions.
# partition 'a' is a partial rewrite (a has 1,2 - we're only deleting
1), we get _OverwriteFiles
# partition 'b' is a full rewrite (category == 'b'), we get _DeleteFiles
tbl2.delete("value == 1 or category == 'b'")
result = catalog.load_table("default.mixed_retry_test").scan().to_arrow()
assert sorted(result.column("value").to_pylist()) == [2, 5, 6, 7]
```
What would you think about creating some kind of `CommitWindow` class that
tracks all of the commits that have been made since we attempted to commit? I'm
hoping that would make it easier for us to understand the code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]