atheendre130505 opened a new issue, #37575:
URL: https://github.com/apache/beam/issues/37575

   ### What happened?
   
   Description: Beam YAML's JSON schema compatibility validation for objects is 
effectively disabled due to a logic error in json_utils.py.
   
   The function _validate_compatible (used by row_validator) attempts to check 
if a Beam schema is compatible with a provided JSON schema. However, it 
contains several "simple code" bugs:
   
   It compares the weak_schema dictionary directly to the string 'object' 
instead of checking its 
   type field.
   It attempts to unpack a dictionary during iteration without calling .items().
   It uses improper string formatting for error messages, leading to unhelpful 
or crashing error reports.
   As a result, Validate transforms in Beam YAML may silently proceed even when 
schemas are fundamentally incompatible, or fail with distracting internal 
tracebacks.
   
   Beam Version: 2.61.x (Python SDK)
   
   Steps to reproduce
   Run the following Python snippet. It should raise a ValueError about 
incompatible types ('string' vs 'integer'), but it currently completes 
successfully because the validation logic is skipped.
   
   
   from apache_beam.yaml import json_utils
   from apache_beam.portability.api import schema_pb2
   from apache_beam.typehints import schemas
   # A schema with a string field
   beam_schema = schema_pb2.Schema(fields=[
       schemas.schema_field('f', schema_pb2.STRING)
   ])
   # An incompatible JSON schema expecting an integer for the same field
   json_schema = {
       'type': 'object',
       'properties': {
           'f': {'type': 'integer'}
       }
   }
   # This SHOULD fail, but silently succeeds due to logic error in json_utils.py
   json_utils.row_validator(beam_schema, json_schema)
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [x] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [x] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to