TGooch44 commented on a change in pull request #2115:
URL: https://github.com/apache/iceberg/pull/2115#discussion_r561240703



##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       I believe this is because there may be multiple hidden partitions 
derived from the same table column.  I think Ryan Blue introduced this in the 
java implementation here:
   
https://github.com/apache/iceberg/commit/649cbdde83693ebda8e8dc6e75857426d25414ec#diff-d1905822d843dea78ebe5404ee9ce885b7adbc46970fbaf931a87ae2758abeb6
   
   Let me double check, I had written this in our internal repo a while back 
and it's a little foggy now.

##########
File path: python/iceberg/api/partition_spec.py
##########
@@ -117,9 +118,9 @@ def compatible_with(self, other):
 
     def lazy_fields_by_source_id(self):
         if self.fields_by_source_id is None:
-            self.fields_by_source_id = dict()
+            self.fields_by_source_id = defaultdict(list)
             for field in self.fields:
-                self.fields_by_source_id[field.source_id] = field
+                self.fields_by_source_id[field.source_id].append(field)

Review comment:
       at the very least, I will add some test coverage, but it may be 
worthwhile looking at the change more critically in general.

##########
File path: python/iceberg/api/expressions/projections.py
##########
@@ -15,32 +15,81 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from typing import TYPE_CHECKING
 
 from .expressions import Expressions, ExpressionVisitors, RewriteNot
 from .predicate import BoundPredicate, UnboundPredicate
 
+if TYPE_CHECKING:
+    from .expression import Expression
+    from .predicate import Predicate
+    from ..partition_spec import PartitionSpec
 
-def inclusive(spec, case_sensitive=True):
+

Review comment:
       agree, I'll update this

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -115,3 +175,14 @@ def unbound_predicate(self, pred):
             return bound_residual
 
         return bound
+
+
+class UnpartitionedEvaluator(ResidualEvaluator):
+
+    def __init__(self, expr):
+        return super(UnpartitionedEvaluator, 
self).__init__(PartitionSpec.unpartitioned(),

Review comment:
       yes, hmm...not sure what I was doing there, but you are of course right. 
 Will update.

##########
File path: python/iceberg/api/expressions/residual_evaluator.py
##########
@@ -15,98 +15,158 @@
 # specific language governing permissions and limitations
 # under the License.
 
+from .expression import Expression
 from .expressions import Expressions, ExpressionVisitors
+from .literals import Literal
 from .predicate import BoundPredicate, Predicate, UnboundPredicate
+from .reference import BoundReference
+from ..partition_spec import PartitionSpec
+from ..struct_like import StructLike
 
 
 class ResidualEvaluator(object):
+    """
+    Finds the residuals for an {@link Expression} the partitions in the given 
{@link PartitionSpec}.
 
-    def __init__(self, spec, expr):
+    A residual expression is made by partially evaluating an expression using 
partition values. For
+    example, if a table is partitioned by day(utc_timestamp) and is read with 
a filter expression
+    utc_timestamp >= a and utc_timestamp <= b, then there are 4 possible 
residuals expressions
+    for the partition data, d:
+
+
+        + If d > day(a) and d < day(b), the residual is always true
+        + If d == day(a) and d != day(b), the residual is utc_timestamp >= b

Review comment:
       yes, definitely right since all values in the file will fall greater 
than that lower bound.  This looks like a typo/mistake, will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to