rdblue commented on code in PR #6323:
URL: https://github.com/apache/iceberg/pull/6323#discussion_r1209504044
##########
python/pyiceberg/table/__init__.py:
##########
@@ -69,21 +72,288 @@
import ray
from duckdb import DuckDBPyConnection
+ from pyiceberg.catalog import Catalog
ALWAYS_TRUE = AlwaysTrue()
+class TableUpdates:
Review Comment:
I think the rest of this PR is about ready, my remaining concern is this
API. I don't think that we want to expose methods that directly set schema,
partitioning, or other table metadata. Schema is a great example because people
are going to really mess up the field IDs. We don't want to encourage that
through the API.
For example, this is going to break:
```python
catalog = load_catalog(...)
schema1 = Schema(
NestedField(field_id=0, name='id', field_type=LongType()),
schema_id=1
)
schema2 = Schema(
NestedField(field_id=0, name='id', field_type=LongType()),
NestedField(field_id=1, name='data', field_type=StringType()),
schema_id=1
)
table = catalog.create_table(identifier='test', schema=schema1)
table.set_schema(schema2).commit()
```
The table schema is incompatible with the second schema because the field
IDs are reassigned.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]