[GitHub] [iceberg] Fokko commented on pull request #4858: Python: Add generated classes from OpenAPI spec

GitBox Fri, 27 May 2022 14:36:40 -0700


Fokko commented on PR #4858:
URL: https://github.com/apache/iceberg/pull/4858#issuecomment-1140056996


   > Is this worth it?
   
   I think it is! Allow me to elaborate on why I think it is the case.
   
   > Many of the classes in the REST spec mirror structures from TableMetadata, 
but when we talked last I thought that we were going to create to_dict / 
from_dict methods to serialize and deserialize those. Having autogenerated 
classes for those seems like duplication.
   
   That's correct. Having the serialization methods would be replaced by this 
PR because we get them for free:
   
   ```python
   ➜  python git:(fd-open-api) python3
   Python 3.9.13 (main, May 19 2022, 13:48:47) 
   [Clang 13.1.6 (clang-1316.0.21.2)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> from iceberg.openapi.rest_catalog import PartitionSpec, PartitionField
   >>> spec = PartitionSpec(
   ...   spec_id=123,
   ...   fields=[
   ...     PartitionField(
   ...       field_id=234,
   ...       source_id=1,
   ...       name='date',
   ...       transform='identity'
   ...     )
   ...   ]
   ... )
   >>> spec
   PartitionSpec(spec_id=123, fields=[PartitionField(field_id=234, source_id=1, 
name='date', transform=Transform(__root__='identity'))])
   
   # Serialize
   >>> spec.json()
   '{"spec_id": 123, "fields": [{"field_id": 234, "source_id": 1, "name": 
"date", "transform": "identity"}]}'
   
   # Deserialize
   >>> PartitionSpec.parse_raw(spec.json())
   PartitionSpec(spec_id=123, fields=[PartitionField(field_id=234, source_id=1, 
name='date', transform=Transform(__root__='identity'))])
   ```
   
   I think this is much less error-prone as code like this: 
https://github.com/apache/iceberg/pull/3677/files#diff-9d9a8492ccd85cbbddfc202b9a954b57a079f2cca33e96eb0a9106cf1a4a8130R37-R58
   
   Next to that, the maintenance cost is much higher when we have to keep it in 
sync by hand. If we generate it, we can check if we break compatibility (using 
mypy).
   
   > Also, I don't think we're getting much out of this. This makes classes 
that are simple and could easily be statically defined. And it requires an 
additional library that looks similar to what we get from @dataclass. I'd 
probably opt not to pull in the additional dependency.
   
   This is correct, but we get some more on top of the data classes; 
validators. Instead of using a builder pattern, we can just nicely validate the 
input using decorators: https://pydantic-docs.helpmanual.io/usage/validators/
   
   This looks like:
   
   ```python
   >>> from iceberg.openapi.rest_catalog import PartitionSpec, PartitionField
   >>> spec = PartitionSpec(
   ...   spec_id=123,
   ...   fields=[
   ...     PartitionField(
   ...       field_id=234,
   ...       name='date',
   ...       transform='identity'
   ...     )
   ...   ]
   ... )
   Traceback (most recent call last):
     File "<stdin>", line 4, in <module>
     File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
   pydantic.error_wrappers.ValidationError: 1 validation error for 
PartitionField
   source-id
     field required (type=value_error.missing)
   ```
   
   > I'm generally unsure about using the OpenAPI doc currently for generating 
classes, as it's presently intended more for documentation purposes than for 
usage for models and generating code. To that extent, it seems that the 
generated classes are at times not that helpful (like Updates).
   
   I see your point, but it is a bit of a catch-22. Until we start using it, it 
won't mature. Instead of appending methods to the python classes, it is just as 
simple as amending the OpenAPI spec and regenerating the code.
   
   > I've also noticed a lot of variance in the generated code depending on 
which library is used to do it. I know one Python generation tool I used to 
generate code made all models just dictionaries... not even classes that were 
secretly dictionaries but just... dictionaries.
   
   Yes, there is a variety of generators and I found this one the best. It 
nicely generates all the aliases as well. I already bumped into some issues 
with the plain python one: 
https://github.com/openapi-generators/openapi-python-client/pull/618
   
   > If we get to a place where the OpenAPI doc can be used as a drop-in, that 
would be great. But talking to some people who are very familiar with OpenAPI, 
my understanding is that some projects have behavior that's simply more complex 
than any of the OpenAPI generation tools can really account for. I've been told 
that either the code has to be written in a very specific style or that it can 
take years to get to such a place.
   
   Great point, and I think we always need to extend the generated classes with 
convenience methods, or additional validation. How to do this is given in 
https://github.com/apache/iceberg/pull/4717#discussion_r882941190
   
   > Is pydantic required if we were to move forward with this or is this just 
added in this PR for other reasons?
   
   For this implementation, pydantic is required: 
https://github.com/koxudaxi/datamodel-code-generator The code generator 
specifically generates the pydantic object (and brings in the validation as 
well). It fits our use-case perfectly: 
https://python.plainenglish.io/an-introduction-to-the-pydantic-stack-9e490d606c8d
   
   It is coming from the FastAPI movement, and it is becoming increasingly more 
popular. Also, pydantic doesn't pull in the whole universe, it just has one 
dependency: 
https://github.com/samuelcolvin/pydantic/blob/master/setup.py#L131-L133


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] Fokko commented on pull request #4858: Python: Add generated classes from OpenAPI spec

Reply via email to