KitFieldhouse commented on issue #476:
URL: https://github.com/apache/avro-rs/issues/476#issuecomment-4009905613

   I think there might be a misunderstanding here on what the issue is. 
   
   My understanding of how the java and python tools work is that for the 
example in the original post:
   
   ```
   {
           "type": "record",
           "name": "ExampleEnum",
           "namespace": "com.schema",
           "fields": [
               {
               "name": "wrong_enum",
               "type": "enum",
               "symbols": ["INSERT", "UPDATE"]
               }
           ]
   }
   ```
   this is interpreted *not* as an attempt to have a flattened type definition, 
but is instead interpreted as saying that the field "wrong_enum" is an instance 
of a named type that has the literal name "enum". I believe the extra JSON 
field (in this case "symbols" ) is effectively ignored. 
   
   This stems from one of the unfortunate aspects (in my opinion) of the 
current avro specification, which is that it allows you to have a named type 
that has a name equal to that of a complex built-in type (but at least mandates 
that you cannot "redefine" primitive names in this way). Quoting [the 
spec](https://avro.apache.org/docs/++version++/specification/#names):
   
   > Primitive type names (null, boolean, int, long, float, double, bytes, 
string) have no namespace and their names may not be defined in any namespace.
   >
   >Complex types (record, enum, array, map, fixed) have no namespace, but 
their names (as well as union) are permitted to be reused as type names. This 
can be confusing to the human reader, but is always unambiguous for binary 
serialization. Due to the limitations of JSON encoding, it is a best practice 
to use a namespace when using these names.
   >
   >A schema or protocol may not contain multiple definitions of a fullname. 
Further, a name must be defined before it is used (“before” in the depth-first, 
left-to-right traversal of the JSON parse 
   
   For instance, we can make @Kriskras99's example work with the python tooling 
by defining a named type with name `"fixed"` earlier in the walk the parser 
takes. That is, instead of:
   ```
   {"type": "record", "name": "flattend", "fields": [
       {"name": "FixedType", "type": "fixed", "size": 12, "logicalType": 
"decimal", "precision": 28, "scale": 15}
   ]}
   ```
   we add another field where we have "defined"  `"fixed"`  like this:
   ```
   {"type": "record", "name": "flattend", "fields": [
       {"name": "FixedTypeDef", "type": {"name": "fixed", "type": "enum", 
"symbols": ["A", "B"]}},
       {"name": "FixedType", "type": "fixed", "size": 12, "logicalType": 
"decimal", "precision": 28, "scale": 15}
   ]}
   ```
   In my testing, python does not reject this schema and treats the second 
field `"FixedType"` as the enum we defined in `"FixedTypeDef"`. The java build 
tool that generates classes based on avro schema also seems to work the exact 
same way. Here is how I tested the python tooling:
   
   ```python
   import avro.schema
   
   schema_str = '''{
       "type": "record",
       "name": "flattened",
       "fields": [
           {
               "name": "FixedTypeDef",
               "type": {"name": "fixed", "type": "enum", "symbols": ["A", "B"]}
           },
           {
               "name": "FixedType",
               "type": "fixed",
               "size": 12,
               "logicalType": "decimal",
               "precision": 28,
               "scale": 15
           }
       ]
   }'''
   
   schema = avro.schema.parse(schema_str)
   
   fixed_def_type_field = schema.fields[0]
   print(f"Field name: {fixed_def_type_field.name}")
   print(f"Field type: {fixed_def_type_field.type}")
   print(f"Field type's type property: {fixed_def_type_field.type.type}")
   print(f"Field type's name: {fixed_def_type_field.type.name}")
   
   fixed_type_field = schema.fields[1]
   print(f"Field name: {fixed_type_field.name}")
   print(f"Field type: {fixed_type_field.type}")
   print(f"Field type's type property: {fixed_type_field.type.type}")
   print(f"Field type's name: {fixed_type_field.type.name}")
   
   ```
   
   So I think the issue in the original python example is not that it rejects 
the schema because it sees it as an attempt to "flatten" the definition, it 
rejects it because it can not find a named schema with the name `"fixed"`.
   
   To avoid ambiguity of what we should do on the schema field `"type"`, we 
have to know where we are in the parsing tree. If we are directly inside a 
schema, a `"type"` value equal to any of the complex type names indicates that 
we are defining a a type of that form. Inside `"fields"` for a record type; 
`"type"` indicates that we are looking for a named type with that name, a 
primitive type with that name, or is a new nested type definition.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to