Jakob Homan created AVRO-1795:
---------------------------------
Summary: Python2: Cannot parse nested schemas
Key: AVRO-1795
URL: https://issues.apache.org/jira/browse/AVRO-1795
Project: Avro
Issue Type: Bug
Components: python
Affects Versions: 1.8.0
Reporter: Jakob Homan
Assignee: Jakob Homan
In the Java client, one can parse nested schemas by loading the nested schema
before the nesting schema.
For example, a header can be defined in one file:
{code:javascript}{ "namespace": "python.avro",
"type": "record",
"name": "header",
"fields": [
{ "name": "header_field", "type": "string" }
]
}{code}
and then included in another schema:
{code:javascript}{ "namespace": "python.avro",
"type": "record",
"name": "event",
"fields": [
{ "name": "header", "type": "python.avro.header" },
{ "name": "event_field", "type": "string" }
]
}{code}
As long as one instantiates the Parser and loads the header first, the schemas
will be reconciled and merged correctly.
However, the Python client does not support this. The {{parse}} method of the
{{schema.py}} file always instantiates a new Names object to hold the schemas:
{code}def parse(json_string):
"""Constructs the Schema from the JSON text."""
# TODO(hammer): preserve stack trace from JSON parse
# parse the JSON
try:
json_data = json.loads(json_string)
except:
raise SchemaParseException('Error parsing JSON: %s' % json_string)
# Initialize the names object
names = Names()
# construct the Avro Schema object
return make_avsc_object(json_data, names){code}
Some possible fixes for this are:
1) Create a separate Parser class to mimic the Schema.Parser Java approach,
while deprecating the current parse method.
2) Include Names as a global variable to the parse method, allowing multiple
parse calls to populate the same namespace. This breaks current behavior (and
at least one unit test depends on it), so would be backwards compatible.
3) Create a new parse method that returns not only the schema, but also the
Names instance and accepts that instance. This keeps the code nice and
functional while exposing the Names class, which previously had been not
particularly public.
I like the first approach.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)