[ 
https://issues.apache.org/jira/browse/AVRO-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062829#comment-13062829
 ] 

James Baldassari commented on AVRO-839:
---------------------------------------

I have a new version of the patch ready.  Here is a summary of the most 
significant changes from the first version:
* deepCopy() moved from Schema to GenericData
* RecordBuilderBase#has(int),get(int),set(int) made protected
* Added clear[Field]() methods, which set the value of a field to null and 
clear the boolean indicating that the field has been set
* I wasn't satisfied with the performance of the Builder implementation, so I 
did quite a bit of profiling and addressed the major bottlenecks.  These were:
** Schema.hashCode(), as noted in AVRO-853.  I removed this bottleneck by 
changing the type of the default value cache from a 
Map<Schema,Map<Field,Object>> to a Map<String,Map<Integer,Object>>, where the 
String is the Schema's full name and the Integer is the field's position.
** Deep copying Utf8 instances was inefficient because it converted the Utf8 to 
a String and then back to a Utf8.  It turns out String.getBytes(Charset) is 
fairly expensive.  I fixed this issue by adding a copy constructor to Utf8 that 
copies the bytes field.  I also improved the performance of Utf8 slightly by 
initializing a Charset once as a static constant and using that throughout the 
class rather than the String "UTF-8".
** Generated SpecificBuilder classes no longer rely on reflection to create new 
record instances.  Instead, newRecord() is overridden in the generated Builder 
classes.

I wrote some performance tests in TestSpecificBuilder that can be run to 
compare the performance of the Builder, with and without default values, to 
manually creating record instances.  The Builder is about an order of magnitude 
slower than manually creating record instances, but I think it's acceptable.  
For example, building 1 billion records manually took about 8ms on my system 
vs. 217ms using the Builder.  After profiling this version of the Builder I 
don't see many opportunities for further optimization.

So I think the patch is in pretty good shape now, but let me know what you 
think.

> Implement builder pattern in generated record classes that sets default 
> values when omitted
> -------------------------------------------------------------------------------------------
>
>                 Key: AVRO-839
>                 URL: https://issues.apache.org/jira/browse/AVRO-839
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: James Baldassari
>            Assignee: James Baldassari
>         Attachments: AVRO-839-v2.patch, AVRO-839.patch
>
>
> This is an idea for an improvement to the SpecificCompiler-generated record 
> classes.  There are two main issues to address:
> # Default values specified in schemas are only used at read time, not when 
> writing/serializing records.  For example, a NullPointerException is thrown 
> when attempting to write a record that has an uninitialized array or string 
> type.  I'm sure this was done for good reasons, like giving users maximum 
> control and preventing unnecessary garbage collection, but I think it's also 
> somewhat confusing and unintuitive for new users (myself included).
> # Users have to create their own factory classes/methods for every record 
> type, both to ensure that all non-primitive members are initialized and to 
> facilitate the construction and initialization of record instances (i.e. 
> constructing and setting values in a single statement).
> These issues have been discussed previously here:
> * [http://search-hadoop.com/m/iDVTn1JVeSR1]
> * AVRO-726
> * AVRO-770
> * [http://search-hadoop.com/m/JuY1V16pwxh1]
> I'd like to propose a solution that is used by at least one other messaging 
> framework.  For each generated record class there will be a public static 
> inner class called Builder.  The Builder inner class has the same fields as 
> the record class, as well as accessors and mutators for each of these fields. 
>  Whenever a mutator method is called, the Builder sets a boolean flag 
> indicating that the field has been set.  All mutators return a reference to 
> 'this', so it's possible to chain a series of setter invocations, which makes 
> it really easy to construct records in a single statement.  The Builder also 
> has a build() method which constructs a record instance using the values that 
> were set in the Builder.  When the build() method is invoked, if there are 
> any fields that have not been set but have default values as defined in the 
> schema, the Builder will set the values of these fields using their defaults.
> One nice thing about implementing the builder pattern in a static inner 
> Builder class rather than in the record itself is that this enhancement will 
> be completely backwards-compatible with existing code.  The record class 
> itself would not change, and the public fields would still be there, so 
> existing code would still work.  Users would have the option to use the 
> Builder or continue constructing records manually.  Eventually the public 
> fields could be phased out, and the record would be made immutable.  All 
> changes would have to be done through the Builder.
> Here is an example of what this might look like:
> {code}
> // Person.newBuilder() returns a new Person.Builder instance
> // All Person.Builder setters return 'this' allowing us to chain set calls 
> together for convenience
> // Person.Builder.build() returns a Person instance after setting any 
> uninitialized values that have defaults
> Person me = 
> Person.newBuilder().setName("James").setCountry("US").setState("MA").build();
> // We still have direct access to Person's members, so the records are 
> backwards-compatible
> me.state = "CA";
> // Person has accessor methods now so that the public fields can be phased 
> out later
> System.out.println(me.getState());
> // No NPE here because the array<Person> field that stores this person's 
> friends has been automatically 
> // initialized by the Builder to a new java.util.ArrayList<Person> due to a 
> @java_class annotation in the IDL
> System.out.println(me.getFriends().size());
> {code}
> What do people think about this approach?  Any other ideas?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to