kevinjqliu opened a new issue, #3493:
URL: https://github.com/apache/iceberg-python/issues/3493

   `TruncateTransform.project` appears to incorrectly project `NOT STARTS WITH` 
predicates for truncated string/binary partition fields.
   
   For `truncate[2]`, PyIceberg currently projects:
   
   ```text
   NOT STARTS WITH "aaa" -> NOT STARTS WITH "aa"
   ```
   
   That is unsafe: the truncated partition value does not contain enough 
information to prove all rows fail the original predicate, so files with 
matching rows can be pruned.
   
   Expected behavior should match apache/iceberg-go#1193 / Java truncate 
projection behavior:
   
   - prefix length < truncate width: keep `NOT STARTS WITH` with the original 
literal
   - prefix length == truncate width: project to `!=`
   - prefix length > truncate width: no inclusive projection
   
   Relevant code: `pyiceberg/transforms.py` `_truncate_array`, plus the 
existing `test_projection_truncate_string_not_starts_with` expectation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to