atheendre130505 opened a new issue, #37576:
URL: https://github.com/apache/beam/issues/37576
### What happened?
The _validate_compatible function in json_utils.py fails to correctly
validate object-type schemas, leading to silent passes on incompatible data or
internal crashes.
Several issues exist in this utility:
Silent Logic Failure: At line 321, elif weak_schema == 'object': compares a
dictionary to a string, which is always false. This causes the entire schema
compatibility check for object properties to be skipped.
Unpacking Crash: At line 325, for name, spec in
weak_schema.get('properties', {}): attempts to iterate over a dictionary
without .items(), causing a ValueError crash.
Mangled Error Messages: Error strings use incorrect formatting (e.g.,
ValueError('Expected object type, got {json_type}.') is missing an f prefix).
Beam Version: 2.61.0 (Python SDK)
```
from apache_beam.yaml import json_utils
from apache_beam.portability.api import schema_pb2
from apache_beam.typehints import schemas
# A schema with a string field
beam_schema = schema_pb2.Schema(fields=[
schemas.schema_field('f', schema_pb2.STRING)
])
# An incompatible JSON schema expecting an integer
json_schema = {
'type': 'object',
'properties': {
'f': {'type': 'integer'}
}
}
```
# 1. This SHOULD fail with a compatibility error, but it silently succeeds.
json_utils.row_validator(beam_schema, json_schema)
# 2. Reaching other paths (if logic fixed) causes:
# ValueError: not enough values to unpack (expected 2, got 1)
### Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
### Issue Components
- [x] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [x] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]