[ 
https://issues.apache.org/jira/browse/AVRO-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912667#action_12912667
 ] 

Scott Carey commented on AVRO-668:
----------------------------------

{quote}
Did you mean to use 'Utf8.class.equals(charSequence.getClass())'? I think 
'charSequence instanceof Utf8' reads much better, and, as we discussed in 
AVRO-667, the performance improvement is perhaps not significant and this might 
be risky if Utf8 is not final. 
{quote}

At first I did this defensively.  It is not risky for a subclass, as that would 
still work, via the CharSequence contract on toString().

But thinking about it a bit more, this only requires that a sublcass implement 
getByteLength() and getBytes() properly, and these are unlikely to be done 
improperly.  
Unlike in other places where equals() and hashCode() have to remain consistent 
with Utf8 and it is easy for someone to write a subclass that is 
self-consistent with Object.equals() and Object.hashCode() but break Utf8's 
variations.  

Therefore instanceof has little value from the defensive side here, and not 
much performance-wise either.  Writing is far more expensive than the 
difference between instanceof and class.equals() here.

{quote}
Did we also want to update GenericDatumWriter#writeString()? Maybe not
{quote}

Oops, I did intend to include that!

Updated patch on the way.

> Java: Streamline writing of Strings for Encoders and GenericDatumWriter
> -----------------------------------------------------------------------
>
>                 Key: AVRO-668
>                 URL: https://issues.apache.org/jira/browse/AVRO-668
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Scott Carey
>            Assignee: Scott Carey
>             Fix For: 1.4.1
>
>         Attachments: AVRO-668.patch, AVRO-668.patch, AVRO-668.patch
>
>
> We can streamline writing of strings to minimize object creation during 
> writes.
> We can avoid converting a String into Utf8 for Json output, and for Binary 
> output we can avoid a Utf8 (but still create a byte[]).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to