GayathriSrividya commented on code in PR #3448:
URL: https://github.com/apache/iceberg-python/pull/3448#discussion_r3336422892
##########
tests/io/test_pyarrow.py:
##########
@@ -3267,6 +3267,59 @@ def _expected_batch(unit: str) -> pa.RecordBatch:
assert _expected_batch("ns" if format_version > 2 else
"us").equals(actual_result)
+def
test_task_to_record_batches_filter_without_positional_deletes_avoids_table_refilter(tmpdir:
str) -> None:
+ arrow_schema = pa.schema((pa.field("id", pa.int32(), nullable=True,
metadata={PYARROW_PARQUET_FIELD_ID_KEY: "1"}),))
+ arrow_table = pa.table([pa.array([1, 2, 3], type=pa.int32())],
schema=arrow_schema)
+ data_file = _write_table_to_data_file(
+ f"{tmpdir}/test_task_to_record_batches_filter_no_positional.parquet",
arrow_schema, arrow_table
+ )
+
+ table_schema = Schema(NestedField(1, "id", IntegerType(), required=False))
+ from pyiceberg.expressions.visitors import bind
+
+ result_batches = list(
+ _task_to_record_batches(
+ PyArrowFileIO(),
+ FileScanTask(data_file),
+ bound_row_filter=bind(table_schema, GreaterThan("id", 1),
case_sensitive=True),
+ projected_schema=table_schema,
+ table_schema=table_schema,
+ projected_field_ids={1},
+ positional_deletes=None,
+ case_sensitive=True,
+ )
+ )
+ assert len(result_batches) == 1
+ assert result_batches[0].column(0).to_pylist() == [2, 3]
+
+
+def
test_task_to_record_batches_filter_with_positional_deletes_handles_empty_batch(tmpdir:
str) -> None:
Review Comment:
Thanks for the feedback, @ebyhr! You're right — the original tests were
passing on the unfixed code because re-filtering is idempotent (the scanner
returns already-filtered rows, so applying the same predicate again gives the
same result).
I've replaced both tests with proper regression coverage:
[test_task_to_record_batches_does_not_use_table_filter_without_positional_deletes](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)
Detects the actual regression: replaces
[pyarrow.Table](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)
with a sentinel class whose
[from_batches](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)
raises
[AssertionError](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html).
A custom metaclass preserves [isinstance(x,
pa.Table)](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)
checks so the rest of the code-path is unaffected. With the fix the sentinel
is never reached; without the fix it fires immediately — confirmed by running
the test against the original
[pyarrow.py](vscode-file://vscode-app/Applications/Visual
%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html).
[test_task_to_record_batches_filter_applied_after_positional_deletes](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html)
Functional test with actual positional deletes: data [1,2,3,4,5], delete
positions 1 and 3 (values 2 and 4), then filter [id >
2](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-browser/workbench/workbench.html).
Expected [3, 5] — a wrong result if either positional deletes or the
subsequent filter were skipped.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]