[ 
https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082917#comment-13082917
 ] 

James Baldassari commented on AVRO-839:
---------------------------------------

Sorry for the long delay in working on this.  I finally had a chance to get 
back to it this week, and I have another patch ready.  The main changes are:

* Moved Builder-related interfaces and base classes out of top-level package 
into org.apache.avro.data
* Getters/setters get/set fields directly rather than calling get()/put()
* In an effort to improve overall builder performance for specific/generated 
records, I moved most of the generic code out of RecordBuilderBase and pushed 
it down into the generated Builder code.  I also did quite a bit of profiling 
in an effort to reduce the biggest bottlenecks.
** There is now a lot more generated code, but it's faster.
** Testing indicates that without using default values (i.e. all fields in the 
Builder are set), the Builder is about one order of magnitude slower than 
directly creating/initializing a record.  However, a Builder instance can be 
reused, and if this is done rather than creating a new Builder each time there 
is almost no performance penalty for using the Builder.
** If the Builder has to resolve a default value from the schema, it's 
obviously going to be a little slower, but it's still only one order of 
magnitude slower than directly creating records.  In my testing the Builder was 
about 3-4 times slower when resolving default values, but I expect this will 
vary a lot depending on the schema.

So let me know what you think.  I'm happy to make more adjustments if necessary.

> Implement builder pattern in generated record classes that sets default 
> values when omitted
> -------------------------------------------------------------------------------------------
>
>                 Key: AVRO-839
>                 URL: https://issues.apache.org/jira/browse/AVRO-839
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: James Baldassari
>            Assignee: James Baldassari
>         Attachments: AVRO-839-v2.patch, AVRO-839-v3.patch, AVRO-839.patch
>
>
> This is an idea for an improvement to the SpecificCompiler-generated record 
> classes.  There are two main issues to address:
> # Default values specified in schemas are only used at read time, not when 
> writing/serializing records.  For example, a NullPointerException is thrown 
> when attempting to write a record that has an uninitialized array or string 
> type.  I'm sure this was done for good reasons, like giving users maximum 
> control and preventing unnecessary garbage collection, but I think it's also 
> somewhat confusing and unintuitive for new users (myself included).
> # Users have to create their own factory classes/methods for every record 
> type, both to ensure that all non-primitive members are initialized and to 
> facilitate the construction and initialization of record instances (i.e. 
> constructing and setting values in a single statement).
> These issues have been discussed previously here:
> * [http://search-hadoop.com/m/iDVTn1JVeSR1]
> * AVRO-726
> * AVRO-770
> * [http://search-hadoop.com/m/JuY1V16pwxh1]
> I'd like to propose a solution that is used by at least one other messaging 
> framework.  For each generated record class there will be a public static 
> inner class called Builder.  The Builder inner class has the same fields as 
> the record class, as well as accessors and mutators for each of these fields. 
>  Whenever a mutator method is called, the Builder sets a boolean flag 
> indicating that the field has been set.  All mutators return a reference to 
> 'this', so it's possible to chain a series of setter invocations, which makes 
> it really easy to construct records in a single statement.  The Builder also 
> has a build() method which constructs a record instance using the values that 
> were set in the Builder.  When the build() method is invoked, if there are 
> any fields that have not been set but have default values as defined in the 
> schema, the Builder will set the values of these fields using their defaults.
> One nice thing about implementing the builder pattern in a static inner 
> Builder class rather than in the record itself is that this enhancement will 
> be completely backwards-compatible with existing code.  The record class 
> itself would not change, and the public fields would still be there, so 
> existing code would still work.  Users would have the option to use the 
> Builder or continue constructing records manually.  Eventually the public 
> fields could be phased out, and the record would be made immutable.  All 
> changes would have to be done through the Builder.
> Here is an example of what this might look like:
> {code}
> // Person.newBuilder() returns a new Person.Builder instance
> // All Person.Builder setters return 'this' allowing us to chain set calls 
> together for convenience
> // Person.Builder.build() returns a Person instance after setting any 
> uninitialized values that have defaults
> Person me = 
> Person.newBuilder().setName("James").setCountry("US").setState("MA").build();
> // We still have direct access to Person's members, so the records are 
> backwards-compatible
> me.state = "CA";
> // Person has accessor methods now so that the public fields can be phased 
> out later
> System.out.println(me.getState());
> // No NPE here because the array<Person> field that stores this person's 
> friends has been automatically 
> // initialized by the Builder to a new java.util.ArrayList<Person> due to a 
> @java_class annotation in the IDL
> System.out.println(me.getFriends().size());
> {code}
> What do people think about this approach?  Any other ideas?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to