szehon-ho commented on code in PR #4520:
URL: https://github.com/apache/iceberg/pull/4520#discussion_r845760260
##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -92,7 +94,26 @@ public TableScan appendsAfter(long fromSnapshotId) {
protected CloseableIterable<FileScanTask> planFiles(TableOperations ops,
Snapshot snapshot, Expression rowFilter,
boolean
ignoreResiduals, boolean caseSensitive,
boolean colStats) {
- CloseableIterable<ManifestFile> filtered = filterManifests(manifests(),
rowFilter, caseSensitive);
+ Map<Integer, PartitionSpec> specsById = table().specs();
+
+ LoadingCache<Integer, ManifestEvaluator> evalCache =
Caffeine.newBuilder().build(specId -> {
+ PartitionSpec spec = specsById.get(specId);
+ PartitionSpec transformedSpec = transformSpec(fileSchema, spec,
PARTITION_FIELD_PREFIX);
+ return ManifestEvaluator.forRowFilter(rowFilter, transformedSpec,
caseSensitive);
+ });
+
+ CloseableIterable<ManifestFile> filtered = CloseableIterable.filter(
+ manifests(),
+ manifest -> {
+ PartitionSpec spec = specsById.get(manifest.partitionSpecId());
+
+ if (spec.fields().stream().anyMatch(f ->
f.transform().equals(Transforms.alwaysNull()))) {
Review Comment:
This case is a bit ugly. For V1 tables in the case of removing partition
fields, it replaces the transform with a "void" transform. What happens is
that when we use this spec to evaluate a manifest file, it filters them all out
as it searches for null value for this partition key, which seems incorrect.
Example, if we remove a field "data_bucket" from partition spec, then this
code would check whether manifest files have null "data_bucket" values.
So hence skipping this check for now.
##########
core/src/main/java/org/apache/iceberg/BaseFilesTable.java:
##########
@@ -92,7 +94,26 @@ public TableScan appendsAfter(long fromSnapshotId) {
protected CloseableIterable<FileScanTask> planFiles(TableOperations ops,
Snapshot snapshot, Expression rowFilter,
boolean
ignoreResiduals, boolean caseSensitive,
boolean colStats) {
- CloseableIterable<ManifestFile> filtered = filterManifests(manifests(),
rowFilter, caseSensitive);
+ Map<Integer, PartitionSpec> specsById = table().specs();
+
+ LoadingCache<Integer, ManifestEvaluator> evalCache =
Caffeine.newBuilder().build(specId -> {
+ PartitionSpec spec = specsById.get(specId);
+ PartitionSpec transformedSpec = transformSpec(fileSchema, spec,
PARTITION_FIELD_PREFIX);
+ return ManifestEvaluator.forRowFilter(rowFilter, transformedSpec,
caseSensitive);
+ });
+
+ CloseableIterable<ManifestFile> filtered = CloseableIterable.filter(
+ manifests(),
+ manifest -> {
+ PartitionSpec spec = specsById.get(manifest.partitionSpecId());
+
+ if (spec.fields().stream().anyMatch(f ->
f.transform().equals(Transforms.alwaysNull()))) {
Review Comment:
This case is a bit ugly. For V1 tables in the case of removing partition
fields, it replaces the transform with a "void" transform. What happens is
that when we use this spec to evaluate a manifest file, it filters them all out
as it searches for null value for this partition key, which seems incorrect.
Example, if we remove a field "data_bucket" from partition spec, then this
code would check whether manifest files have null "data_bucket" values.
So hence skipping filtering for this case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]