rdblue commented on code in PR #5192:
URL: https://github.com/apache/iceberg/pull/5192#discussion_r920331948
##########
python/pyiceberg/table/partitioning.py:
##########
@@ -57,24 +81,33 @@ class PartitionSpec:
last_assigned_field_id(int): auto-increment partition field id
starting from PARTITION_DATA_ID_START
"""
- schema: Schema
- spec_id: int
- fields: Tuple[PartitionField, ...]
- last_assigned_field_id: int
- source_id_to_fields_map: Dict[int, List[PartitionField]] =
field(init=False, repr=False)
-
- def __post_init__(self):
- source_id_to_fields_map = {}
- for partition_field in self.fields:
- source_column =
self.schema.find_column_name(partition_field.source_id)
- if not source_column:
- raise ValueError(f"Cannot find source column:
{partition_field.source_id}")
- existing = source_id_to_fields_map.get(partition_field.source_id,
[])
- existing.append(partition_field)
- source_id_to_fields_map[partition_field.source_id] = existing
- object.__setattr__(self, "source_id_to_fields_map",
source_id_to_fields_map)
-
- def __eq__(self, other):
+ spec_id: int = Field(alias="spec-id")
+ fields: Tuple[PartitionField, ...] = Field()
+ last_assigned_field_id: int = Field(alias="last-assigned-field-id",
default=_PARTITION_DATA_ID_START)
Review Comment:
We probably don't need this. It is just a high watermark of the field IDs in
this spec. We can just define a property that returns it:
```python
@property
def last_assigned_field_id(self):
return max(pf.field_id for pf in self.fields)
```
This isn't serialized with the partition spec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]