[GitHub] [iceberg] rdblue commented on a diff in pull request #6714: Python: Filter on Datafile metrics

via GitHub Fri, 17 Feb 2023 10:02:15 -0800


rdblue commented on code in PR #6714:
URL: https://github.com/apache/iceberg/pull/6714#discussion_r1110159056



##########
python/pyiceberg/table/__init__.py:
##########
@@ -314,11 +314,19 @@ def _check_content(file: DataFile) -> DataFile:
         return file
 
 
-def _open_manifest(io: FileIO, manifest: ManifestFile, partition_filter: 
Callable[[DataFile], bool]) -> List[FileScanTask]:
+def _open_manifest(
+    io: FileIO,
+    schema: Schema,
+    row_filter: BooleanExpression,
+    manifest: ManifestFile,
+    partition_filter: Callable[[DataFile], bool],
+    case_sensitive: bool,
+) -> List[FileScanTask]:
     all_files = files(io.new_input(manifest.manifest_path))
     matching_partition_files = filter(partition_filter, all_files)
     matching_partition_data_files = map(_check_content, 
matching_partition_files)
-    return [FileScanTask(file) for file in matching_partition_data_files]
+    evaluator = _InclusiveMetricsEvaluator(schema, row_filter, case_sensitive)

Review Comment:
   Do we need to create a new metrics evaluator for each manifest? The table 
schema, row filter, and case sensitive values are always the same.
   
   I think we could handle this like `partition_filter` and pass in a 
`Callable`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6714: Python: Filter on Datafile metrics

Reply via email to