[GitHub] [iceberg] dramaticlly commented on a diff in pull request #4717: Python: Add PartitionSpec

GitBox Mon, 09 May 2022 20:11:32 -0700


dramaticlly commented on code in PR #4717:
URL: https://github.com/apache/iceberg/pull/4717#discussion_r868771265



##########
python/src/iceberg/table/partitioning.py:
##########
@@ -64,3 +67,100 @@ def __str__(self):
 
     def __repr__(self):
         return f"PartitionField(field_id={self.field_id}, name={self.name}, 
transform={repr(self.transform)}, source_id={self.source_id})"
+
+    def __hash__(self):
+        return hash((self.source_id, self.field_id, self.name, self.transform))
+
+
+class PartitionSpec:
+    """
+    PartitionSpec capture the transformation from table data to partition 
values
+
+    Attributes:
+        schema(Schema): the schema of data table
+        spec_id(int): any change to PartitionSpec will produce a new specId
+        fields(List[PartitionField): list of partition fields to produce 
partition values
+        last_assigned_field_id(int): auto-increment partition field id 
starting from PARTITION_DATA_ID_START
+    """
+
+    PARTITION_DATA_ID_START: int = 1000
+
+    def __init__(self, schema: Schema, spec_id: int, fields: 
Tuple[PartitionField], last_assigned_field_id: int):
+        self._schema = schema
+        self._spec_id = spec_id
+        self._fields = fields
+        self._last_assigned_field_id = last_assigned_field_id
+        # derived
+        self.fields_by_source_id: Dict[int, List[PartitionField]] = {}
+
+    @property
+    def schema(self) -> Schema:
+        return self._schema
+
+    @property
+    def spec_id(self) -> int:
+        return self._spec_id
+
+    @property
+    def fields(self) -> Tuple[PartitionField]:
+        return self._fields
+
+    @property
+    def last_assigned_field_id(self) -> int:
+        return self._last_assigned_field_id
+
+    def __eq__(self, other):
+        return self.spec_id == other.spec_id and self.fields == other.fields
+
+    def __str__(self):
+        if self.is_unpartitioned():

Review Comment:
   I think it's a bit nasty to construct it exactly like what Java 
implementation does, so the special casing is essentially simplified from 
https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L299.
 The unpartitioned differ from rest with 1 less `\n` right before the `]`
   
   so here's my alternative way to construct, not sure how do you like it 
compare to what I have right now.
   
   
   
   ```python
       def __str__(self):
           result_str = "["
           for partition_field in self.fields:
               result_str += f"\n  {str(partition_field)}"
           if self.is_unpartitioned():
               result_str += "]"
           else:
               result_str += "\n]"
           return result_str
   ```
   
   Python `string.join(Iterable)` does not have a way to attach head and tail, 
so I figured it might be easier to see it this way as in 
`f"{head}{delimiter.join(partition_fields_in_str)}{tail}"`.
   
   But I can certainly change quickly if above is what you prefer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] dramaticlly commented on a diff in pull request #4717: Python: Add PartitionSpec

Reply via email to