[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

Scott Carey (JIRA) Fri, 10 May 2013 14:19:18 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654852#comment-13654852
 ]


Scott Carey commented on AVRO-1325:
-----------------------------------

I'll need to clean up the doc a little w.r.t. complexity in a couple places.  

BaseTypeBuilder has three overloads for type(), FieldBuilder has three more due 
to defaults.
What do you think would be confusing to the causal user?  What could not be 
satisfied with javadoc?  I expect users to have IDEs set up with Maven so that 
javadoc is available at autocomplete/suggestion.  Are you concerned that will 
not be the case for most users?
We could rename the string name reference variants for FieldBuilder's to 
typeReference().   I named all of them type() because when writing the doc, it 
was easy to say:
{code}
  /**
   * Builds a Field in the context of a {@link FieldAssembler}.
   * 
   * Usage is to first configure any of the optional parameters and then to 
call one
   * of the type methods to complete the field.  For example
   * <pre>
   *   .namespace("org.apache.example").orderDecending().type()
   * </pre>
   * Optional parameters for a field are namespace, doc, order, and aliases.
   */
{code}

We could change the name of the ones that select a name by reference to 
"typeRef", or remove the ones that select default values and instead force the 
user to call an additional noDefaults() or withDefault() method afterwards to 
reduce the number of methods named 'type'.

For a generic type builder, used by map and array, values() and items() returns 
a builder that has three variants of type() on it, these could be rolled in to 
the map and array instead:
{code}
  map().values().intType()
  map().values().type("MD5")
  map().values().type(someSchema)
{code}
we could have:
{code}
  map().values().intType()
  map().values("MD5")
  map().values(someSchema)
{code}

I did not do this because it shared more code and consistency --  SchemaBuilder 
itself has the same API.

{quote}
What is the difference between intWith() and intType()
{quote}
I need feedback/suggestions on naming and API here.
The javadoc for intType would say "select an int type without custom 
properties.  A shortcut for intWith().endInt()".  The javadoc for intWith would 
say "return an int type builder for creating an int with properties. If 
properties are not required use the #intType() shortcut".

intWith() exists only for the case where you need to add a property to the int, 
which is uncommon, so I wanted a shortcut for the common case.  I did not want 
to have the context for optional properties to bleed into the following context 
after type selection, since that adds extra methods to the later context and 
applies to doc() and namespace() as well.  After selecting a type, the context 
either returns to an earlier scope or to a field default selection.  In the 
former case, properties are ambiguous with the outer scope (a field, array, 
map, or record context).  In the latter case, having a prop() method in scope 
is not applicable to default value selection.

I decided to have the methods available in any scope be unambiguous and 
correspond with the JSON declaration and the spec.  This reduces how many 
methods are available in all contexts significantly from the current version in 
trunk and is intended to prevent user error.

Alternatively we could name "intWith() to "intBuilder()", or "intWithProps()", 
or change intType() to be the builder, and have intSimple() for the shortcut, 
but I wanted the common case to be the most obvious.  I think the naming and 
documentation here could certainly be improved.   Making it clear to a user 
which "intXYZ" (and the similar methods for all other primitive types) is 
critical.  The vast majority of the time users will not need to set properties 
on primitive types.


{quote}
I think this is easy to add to the existing SchemaBuilder by adding an 
addProp(String key, String value) method to the builders (RecordBuilder, 
FieldBuilder, ArrayBuilder). For FieldBuilder we could also have a 
addPropToType(String key, String value) method to distinguish between 
properties that are added to the underlying type, not the field itself.
{quote}
It is complicated and confusing to scope properties correctly.  The context for 
setting a property is easily ambiguous in all cases where it is not contained 
in its own scope.   For example:
{code}
{"type":"record", "name":"Rec", "recProp":"r", "fields": [
  {"name":"locations", "fieldProp":"f", "type": {"type":"map", "mapProp":"m", 
"values":{"type":"string", "valProp":"v"}}}
]}
{code}
What would this look like with the API in trunk now if extended to support 
properties?  I had trouble reasoning about making it obvious to the user when 
the field's type contexts bled into the record context.  If you add the ability 
to chain the builder to build a nested type in the map, it gets even more 
messy.  The scoping and nesting allows for chaining, and thus propagation of 
default namespace and references by name to schemas already defined by the 
builder.  I plan to try this out in Schema.Parser() to simplify that code and 
share logic related to Schemas and the spec.

In the proposal here (slightly tweaked locally), the schema above is:
{code}
SchemaBuilder.type().record("Rec").prop("recProp","r").fields()
  .name("locations").prop("fieldProp","f").type().map().prop("mapProp", 
"m").values().stringWith().prop("valProp","v").endString().noDefault()
  .endRecord();
{code}

This is very similar to the JSON.  I am hoping that it is very easy for 
developers familiar with the JSON spec to use.

It is not possible to call prop(String, String) in a context that is ambiguous, 
as it is only available immediately after declaring a type, before completing 
the required information for the type.
* For records, after record(String name) but before fields()
* For fields, after name(String name) but before one of the type() variants
* For map, after map() but before values()
* For types in general, after the creation of the type builder (stringWith(), 
map(), array(), record(), etc) but before its completion or delegation to 
another type builder (endString(), values(), items(), fields()).

The same principle is applied to namespace() doc(), order(), etc to prevent 
ambiguity and limit the number of methods available in any context.

                
> Enhanced Schema Builder API
> ---------------------------
>
>                 Key: AVRO-1325
>                 URL: https://issues.apache.org/jira/browse/AVRO-1325
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.7.5
>
>         Attachments: AVRO-1325-preliminary.patch
>
>
> The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
> changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

Reply via email to