Fokko commented on PR #7751:
URL: https://github.com/apache/iceberg/pull/7751#issuecomment-1577410577

   @rdblue The idea is that you'll have the changes in the open-api spec and 
also the changes in the Python code alongside. Small changes can have a 
significant impact on the code itself.
   
   Let's take https://github.com/apache/iceberg/pull/7710#discussion_r1216646858
   
   ```yaml
       ReportMetricsRequest:
         anyOf:
           - $ref: '#/components/schemas/ScanReport'
           - $ref: '#/components/schemas/CommitReport'
         required:
           - report-type
         properties:
           report-type:
             type: string
   ```
   
   Above is the the current code on master, and that generates:
   
   ```python
   class ReportMetricsRequest(ScanReport):
       report_type: str = Field(..., alias='report-type')
   
   
   class ReportMetricsRequest1(CommitReport):
       report_type: str = Field(..., alias='report-type')
   
   
   class ReportMetricsRequest2(BaseModel):
       __root__: Union[ReportMetricsRequest, ReportMetricsRequest1]
   ```
   
   
   This looks weird, and it is. `ReportMetricsRequest` is `anyOf` (validates 
the value against any (one or more) of the subschemas) of both `ScanReport` and 
`CommitReport`. It is actually the other way around. `ScanReport` and 
`CommitReport` is `allOf` `ReportMetricsRequest`. And based on the 
`report-type` it is `oneOf` the subtypes based on the `report-type`:
   
   ```yaml
       ReportMetricsRequest:
         type: object
         discriminator:
           propertyName: report-type
           mapping:
             commit-report: '#/components/schemas/CommitReport'
             scan-report: '#/components/schemas/ScanReport'
         required:
           - report-type
         properties:
           report-type:
             type: string
   ```
   
   The point that I'm trying to make here is that is very easy to get confused 
in the `allOf`, `anyOf`, and `oneOf`. And for me, it really helps to see the 
code alongside the PR. The idea is that we keep the Python code up to sync, and 
when you update the open-api code, you'll see the impact on the code. If you do 
it the other way around, you'll end up with:
   
   ```python
   class ReportMetricsRequest(BaseModel):
       report_type: str = Field(..., alias='report-type')
   
   
   class CommitReport(ReportMetricsRequest):
       report_type: Literal['commit-report'] = Field(..., alias='report-type')
       table_name: str = Field(..., alias='table-name')
       snapshot_id: int = Field(..., alias='snapshot-id')
       sequence_number: int = Field(..., alias='sequence-number')
       operation: str
       metrics: Metrics
       metadata: Optional[Dict[str, str]] = None
   
   
   class ScanReport(ReportMetricsRequest):
       report_type: Literal['scan-report'] = Field(..., alias='report-type')
       table_name: str = Field(..., alias='table-name')
       snapshot_id: int = Field(..., alias='snapshot-id')
       filter: Expression
       schema_id: int = Field(..., alias='schema-id')
       projected_field_ids: List[int] = Field(..., alias='projected-field-ids')
       projected_field_names: List[str] = Field(..., 
alias='projected-field-names')
       metrics: Metrics
       metadata: Optional[Dict[str, str]] = None
   ```
   
   If we want to generate clients in other languages as well, it helps to keep 
the open-API as close to the supposed implementation as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to