[ 
https://issues.apache.org/jira/browse/AVRO-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned AVRO-4065:
------------------------------------


> Do Not Copy Array Contents when Expanding UTF-8 Arrays
> ------------------------------------------------------
>
>                 Key: AVRO-4065
>                 URL: https://issues.apache.org/jira/browse/AVRO-4065
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.12.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> The following code snippet in UTF8.java shows that when an array is re-sized 
> to accept a new String, the contents of the original array are copied into 
> the new array.
> I'm not sure the intent here, but the expectation is that the array is being 
> resized precisely because it is about to be overwritten, hence why the cache 
> String is blown away too.
> {code:java}
>   /**
>    * Set length in bytes. Should called whenever byte content changes, even 
> if the
>    * length does not change, as this also clears the cached String.
>    */
>   public Utf8 setByteLength(int newLength) {
>     SystemLimitException.checkMaxStringLength(newLength);
>     if (this.bytes.length < newLength) {
>       this.bytes = Arrays.copyOf(this.bytes, newLength);
>     }
>     this.length = newLength;
>     this.string = null;
>     this.hash = 0;
>     return this;
>   }
> {code}
> If we peek at the JDK code, it's simply creating a new array, and then 
> copying over the existing contents, but also zero-padding.
> {code:java}
>     public static byte[] copyOf(byte[] original, int newLength) {
>         byte[] copy = new byte[newLength];
>         System.arraycopy(original, 0, copy, 0,
>                          Math.min(original.length, newLength));
>         return copy;
>     }
> {code}
> So this is problematic for a few reasons... number one is that it is wasted 
> CPU cycle to copy the data and then immediately overwrite it. Second, it's 
> not well documented/understood/expected what the state of the String is after 
> calling this method. It makes sense that when truncating the string the value 
> is a prefix, but unexpected what the behavior is when expanding the String 
> without knowing the underlying code (i.e., padding of zeros).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to