[
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651288#comment-13651288
]
Scott Carey edited comment on AVRO-1325 at 5/7/13 9:10 PM:
-----------------------------------------------------------
Below are the limitations that concern me from AVRO-1274, in approximate
priority of my concern.
# Arbitrary properties are not supported, for example {"type":"string",
"avro.java.string":"String"} can not be built.
# SchemaBuilder.INT and other constants are public. Unfortunately, these are
mutable, and anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a
# Does not chain to nested types. Although there is limited chaining for
record fields, nested calls to the builder are required which prevents
supporting namespace nesting or other passing of context from outer to inner
scopes.
I have a prototype patch that builds on the work in AVRO-1274. The major
changes are to how scopes are handled for fields and unions, since adding
property support is not trivial on top of AVRO-1274 because there is much
ambiguity in what a call to add a property would apply to (the field, or the
type of the field?)
The following schema:
{code}
{"type":"record","name":"HandshakeRequest","namespace":"org.apache.avro.ipc","fields":[
{"name":"clientHash","type":{"type":"fixed","name":"MD5","size":16}},
{"name":"clientProtocol","type":[
"null",
{"type":"string","avro.java.string":"String"}]},
{"name":"serverHash","type":"MD5"},
{"name":"meta","type":[
"null",
{"type":"map","values":"bytes","avro.java.string":"String"}]}
]}
{code}
looks like this in the builder:
{code}
Schema result = SchemaBuilder
.recordType("HandshakeRequest").namespace("org.apache.avro.ipc").fields()
.name("clientHash").type().fixed("MD5").size(16).noDefault()
.name("clientProtocol").type().unionOf()
.nullType().and()
.stringWith().prop("avro.java.string",
"String").endString().endUnion().noDefault()
.name("serverHash").type("MD5")
.name("meta").type().unionOf()
.nullType().and()
.map().prop("avro.java.string",
"String").values().bytesType().endUnion().withDefault(null)
.record();
{code}
It supports the same feature set that JSON schemas do:
* nesting of namespaces ("MD5" above automatically picks up the
"org.apache.avro.ipc" namespace)
* reference of named types by name .type("MD5") above for serverHash
And enforces other rules:
* union defaults are required to be the same as the first type in the union
* properties, doc(), namespace, and aliases work only in the contexts that
they are supported.
Supported features are scoped with many internal nested types, for example, the
field assembler returned by the record builder's fields() method has only two
methods -- name(String) and record(), and the type builder that name(String)
returns type builder for a field, which has prop(String, String) for the field
and the available types, such as map(). A call to map() returns a map builder,
which has prop(String, String) again but for the map, and values() ends the use
of the map builder, changing scope to the nested type and returning down to the
fields assembler when that is complete.
h4. Remaining Work
* All primitive types are not supported yet (trivial)
* Shortcut methods need to be added for common use cases such as an optional
field.
* Naming of some things needs review -- it would be easier if enum, int, long,
default, etc were not reserved java key words :)
* Javadoc is nearly absent.
* There is some room for pushing more common work into parent types.
* Tests
* Attempt to replace the Schema.Parser logic with it, at minimum to test for
areas of improvement or missing features.
* No protocol support yet (e.g. error, protocol, request, response). It
probably makes sense to extend this to cover all Avro things, including fields
and protocols.
I want to checkpoint the work so far and gather feedback.
was (Author: scott_carey):
Below are the limitations that concern me from AVRO-1274, in approximate
priority of my concern.
# Arbitrary properties are not supported, for example {"type":"string",
"avro.java.string":"String"} can not be built.
# SchemaBuilder.INT and other constants are public. Unfortunately, these are
mutable, and anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a
# Does not chain to nested types. Although there is limited chaining for
record fields, nested calls to the builder are required which prevents
supporting namespace nesting or other passing of context from outer to inner
scopes.
I have a prototype patch that builds on the work in AVRO-1274. The major
changes are to how scopes are handled for fields and unions, since adding
property support is not trivial on top of AVRO-1274 because there is much
ambiguity in what a call to add a property would apply to (the field, or the
type of the field?)
The following schema:
{code:json}
{"type":"record","name":"HandshakeRequest","namespace":"org.apache.avro.ipc","fields":[
{"name":"clientHash","type":{"type":"fixed","name":"MD5","size":16}},
{"name":"clientProtocol","type":[
"null",
{"type":"string","avro.java.string":"String"}]},
{"name":"serverHash","type":"MD5"},
{"name":"meta","type":[
"null",
{"type":"map","values":"bytes","avro.java.string":"String"}]}
]}
{code}
looks like this in the builder:
{code}
Schema result = SchemaBuilder
.recordType("HandshakeRequest").namespace("org.apache.avro.ipc").fields()
.name("clientHash").type().fixed("MD5").size(16).noDefault()
.name("clientProtocol").type().unionOf()
.nullType().and()
.stringWith().prop("avro.java.string",
"String").endString().endUnion().noDefault()
.name("serverHash").type("MD5")
.name("meta").type().unionOf()
.nullType().and()
.map().prop("avro.java.string",
"String").values().bytesType().endUnion().withDefault(null)
.record();
{code}
It supports the same feature set that JSON schemas do:
* nesting of namespaces ("MD5" above automatically picks up the
"org.apache.avro.ipc" namespace)
* reference of named types by name .type("MD5") above for serverHash
And enforces other rules:
* union defaults are required to be the same as the first type in the union
* properties, doc(), namespace, and aliases work only in the contexts that
they are supported.
Supported features are scoped with many internal nested types, for example, the
field assembler returned by the record builder's fields() method has only two
methods -- name(String) and record(), and the type builder that name(String)
returns type builder for a field, which has prop(String, String) for the field
and the available types, such as map(). A call to map() returns a map builder,
which has prop(String, String) again but for the map, and values() ends the use
of the map builder, changing scope to the nested type and returning down to the
fields assembler when that is complete.
h4. Remaining Work
* All primitive types are not supported yet (trivial)
* Shortcut methods need to be added for common use cases such as an optional
field.
* Naming of some things needs review -- it would be easier if enum, int, long,
default, etc were not reserved java key words :)
* Javadoc is nearly absent.
* There is some room for pushing more common work into parent types.
* Tests
* Attempt to replace the Schema.Parser logic with it, at minimum to test for
areas of improvement or missing features.
* No protocol support yet (e.g. error, protocol, request, response). It
probably makes sense to extend this to cover all Avro things, including fields
and protocols.
I want to checkpoint the work so far and gather feedback.
> Enhanced Schema Builder API
> ---------------------------
>
> Key: AVRO-1325
> URL: https://issues.apache.org/jira/browse/AVRO-1325
> Project: Avro
> Issue Type: Bug
> Reporter: Scott Carey
> Assignee: Scott Carey
> Fix For: 1.7.5
>
>
> The schema builder from AVRO-1274 has a few key limitations. I have proposed
> changes to make before it is released and the public API is locked in.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira