[ 
https://issues.apache.org/jira/browse/AVRO-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159411#comment-13159411
 ] 

James Baldassari commented on AVRO-964:
---------------------------------------

In that case there is one other option.  If you have a specific record, and you 
want to create a deep copy of it, you can use the Avro Builder API.  For 
example, let's say you have a specific record of type {{MyRecord}} that you 
want to clone.  You can accomplish that in the following way using the Builder 
API:

{code}
// Read in a MyRecord instance or initialize a new one:
MyRecord myRecord = ... 

// Copy myRecord by creating a new MyRecord.Builder, initializing it with the 
existing MyRecord instance, and then building it:
MyRecord myRecordCopy = MyRecord.newBuilder(myRecord).build();
{code}

Under the hood the Builder uses {{GenericData.deepCopy(Schema, Object)}}, so 
performance will be similar to your previous tests.  If performance is an issue 
then you may need to dig into the code and see if there are any ways to make 
this faster.  I did spend quite some time profiling the Builder code and 
{{GenericData.deepCopy(Schema, Object)}}, and I think most of the easy 
performance fixes have been implemented.  If I remember correctly, the biggest 
remaining bottleneck is the Schema.hashCode() method.  Please see 
{{org.apache.avro.specific.TestSpecificRecordBuilder}} in the {{avro-ipc}} 
project for more examples and some basic performance tests that you can 
run/profile.

It may be possible to generate a record-specific version of deepCopy for each 
record type that would be faster than the generic implementation.  I would 
encourage you to give it a try if you have some time to work on it.
                
> Provide clone() method for generated avro-specific objects
> ----------------------------------------------------------
>
>                 Key: AVRO-964
>                 URL: https://issues.apache.org/jira/browse/AVRO-964
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.7.0
>            Reporter: Vyacheslav Zholudev
>
> It would be great to provide a generated clone() method for avro-specific 
> objects like it is done for equals(), toString() and hashCode() methods.
> Due to object re-usage in Hadoop, it is often necessary to clone objects 
> because they have to be processed all together after e.g. Reducer input has 
> been read. 
> Currently I see two ad-hoc options to deal with it:
> 1) Create potentially lots of tedious code to clone objects manually. This 
> method is error-prone, since it's easy to forget cloning of some fields after 
> schema evolution
> 2) Use DatumWriter and DatumReader to serialize/deserialize objects. This 
> method works extremely slow (in my experiments 30-40 times slower than method 
> #1.
> So neither of methods is sufficiently good, on the other hand adding a 
> generated clone() method should be not that complicated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to