[
https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082917#comment-13082917
]
James Baldassari commented on AVRO-839:
---------------------------------------
Sorry for the long delay in working on this. I finally had a chance to get
back to it this week, and I have another patch ready. The main changes are:
* Moved Builder-related interfaces and base classes out of top-level package
into org.apache.avro.data
* Getters/setters get/set fields directly rather than calling get()/put()
* In an effort to improve overall builder performance for specific/generated
records, I moved most of the generic code out of RecordBuilderBase and pushed
it down into the generated Builder code. I also did quite a bit of profiling
in an effort to reduce the biggest bottlenecks.
** There is now a lot more generated code, but it's faster.
** Testing indicates that without using default values (i.e. all fields in the
Builder are set), the Builder is about one order of magnitude slower than
directly creating/initializing a record. However, a Builder instance can be
reused, and if this is done rather than creating a new Builder each time there
is almost no performance penalty for using the Builder.
** If the Builder has to resolve a default value from the schema, it's
obviously going to be a little slower, but it's still only one order of
magnitude slower than directly creating records. In my testing the Builder was
about 3-4 times slower when resolving default values, but I expect this will
vary a lot depending on the schema.
So let me know what you think. I'm happy to make more adjustments if necessary.
> Implement builder pattern in generated record classes that sets default
> values when omitted
> -------------------------------------------------------------------------------------------
>
> Key: AVRO-839
> URL: https://issues.apache.org/jira/browse/AVRO-839
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: James Baldassari
> Assignee: James Baldassari
> Attachments: AVRO-839-v2.patch, AVRO-839-v3.patch, AVRO-839.patch
>
>
> This is an idea for an improvement to the SpecificCompiler-generated record
> classes. There are two main issues to address:
> # Default values specified in schemas are only used at read time, not when
> writing/serializing records. For example, a NullPointerException is thrown
> when attempting to write a record that has an uninitialized array or string
> type. I'm sure this was done for good reasons, like giving users maximum
> control and preventing unnecessary garbage collection, but I think it's also
> somewhat confusing and unintuitive for new users (myself included).
> # Users have to create their own factory classes/methods for every record
> type, both to ensure that all non-primitive members are initialized and to
> facilitate the construction and initialization of record instances (i.e.
> constructing and setting values in a single statement).
> These issues have been discussed previously here:
> * [http://search-hadoop.com/m/iDVTn1JVeSR1]
> * AVRO-726
> * AVRO-770
> * [http://search-hadoop.com/m/JuY1V16pwxh1]
> I'd like to propose a solution that is used by at least one other messaging
> framework. For each generated record class there will be a public static
> inner class called Builder. The Builder inner class has the same fields as
> the record class, as well as accessors and mutators for each of these fields.
> Whenever a mutator method is called, the Builder sets a boolean flag
> indicating that the field has been set. All mutators return a reference to
> 'this', so it's possible to chain a series of setter invocations, which makes
> it really easy to construct records in a single statement. The Builder also
> has a build() method which constructs a record instance using the values that
> were set in the Builder. When the build() method is invoked, if there are
> any fields that have not been set but have default values as defined in the
> schema, the Builder will set the values of these fields using their defaults.
> One nice thing about implementing the builder pattern in a static inner
> Builder class rather than in the record itself is that this enhancement will
> be completely backwards-compatible with existing code. The record class
> itself would not change, and the public fields would still be there, so
> existing code would still work. Users would have the option to use the
> Builder or continue constructing records manually. Eventually the public
> fields could be phased out, and the record would be made immutable. All
> changes would have to be done through the Builder.
> Here is an example of what this might look like:
> {code}
> // Person.newBuilder() returns a new Person.Builder instance
> // All Person.Builder setters return 'this' allowing us to chain set calls
> together for convenience
> // Person.Builder.build() returns a Person instance after setting any
> uninitialized values that have defaults
> Person me =
> Person.newBuilder().setName("James").setCountry("US").setState("MA").build();
> // We still have direct access to Person's members, so the records are
> backwards-compatible
> me.state = "CA";
> // Person has accessor methods now so that the public fields can be phased
> out later
> System.out.println(me.getState());
> // No NPE here because the array<Person> field that stores this person's
> friends has been automatically
> // initialized by the Builder to a new java.util.ArrayList<Person> due to a
> @java_class annotation in the IDL
> System.out.println(me.getFriends().size());
> {code}
> What do people think about this approach? Any other ideas?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira