[
https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690041#comment-13690041
]
Thiruvalluvan M. G. commented on AVRO-1348:
-------------------------------------------
The patch seems fine. But it leads to subtle bugs:
- The patch caches the string output in {{toString()}}. Since UTF8 exposes the
underlying byte array through {{getBytes()}}, any change made to the contents
of the array after first invocation of toString() will not be reflected in the
future output of toString(). I don't think there is any simple way to intercept
changes to byte array. One way is to do this - (a) don't cache if someone has
ever called {{getBytes}} in the past (b) invalidate cache if {{getBytes()}} is
called later (c) if Utf8 is constructed using {{Utf8(byte[] bytes)}} do not
cache. Hopefully, in the most common cases, byte array is not exposed and hence
cache would still work. If all these appear too complicated, we can just drop
caching.
- Thread-safety. CharsetDecoder is not thread-safe. If two threads invoke
{{toString()}} simultaneously, the behavior is undefined. Thread-safety need to
be brought in. I'm not sure how expensive is {{Charset.newDocoder()}}. Since we
need to serialize access to {{decode()}}, we can have a single static
CharsetDecoder and get some additional performance.
Apart from these, there are some minor coding-style violations.
> Improve Utf8 to String conversion
> ---------------------------------
>
> Key: AVRO-1348
> URL: https://issues.apache.org/jira/browse/AVRO-1348
> Project: Avro
> Issue Type: Bug
> Reporter: Mark Wagner
> Assignee: Mohammad Kamrul Islam
> Attachments: AVRO1348v1.patch
>
>
> AVRO-1241 found that the existing method of creating Strings from Utf8 byte
> arrays could be made faster. The same method is being used in the
> Utf8.toString(), and could likely be sped up by doing the same thing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira