[ 
https://issues.apache.org/jira/browse/OOZIE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471018#comment-16471018
 ] 

Misha Dmitriev commented on OOZIE-3250:
---------------------------------------

Thank you for making this change, [~andras.piros]!

I was on the verge of approving it when I realized that attempting to intern 
arrays in the same way as non-array objects may not work as expected. That's 
because the equals() method in arrays compares two arrays with a simple '==', 
rather than by contents. So you have to use some more code to achieve what you 
want. See the test below that I wrote, it's self-explanatory enough.
{code:java}
import com.google.common.collect.Interner;
import com.google.common.collect.Interners;
import org.junit.Assert;
import org.junit.Test;

import java.util.Arrays;

public class TestByteArrayInterning {

  private static final Interner<byte[]> BYTE_ARRAY_INTERNER = 
Interners.newWeakInterner();

  private static final Interner<ByteArrayWrapper> BYTE_ARWR_INTERNER = 
Interners.newWeakInterner();

  @Test
  public void testByteArrayInterning() {
    int nArrs = 1000;
    int nEls = 100;
    byte[][] sameContentArrs = new byte[nArrs][];
    for (int i = 0; i < nArrs; i++) {
      byte[] b = new byte[nEls];
      sameContentArrs[i] = b;
      for (int j = 0; j < nEls; j++) b[j] = (byte) j;
    }

    byte[][] internedArrs = new byte[nArrs][];
    for (int i = 0; i < nArrs; i++) {
      internedArrs[i] = BYTE_ARRAY_INTERNER.intern(sameContentArrs[i]);
    }

    // Trying to intern byte[] arrays directly doesn't work, because in the 
implemenation
    // of arrays the equals() method is the same as '=='. It doesn't compare 
arrays' _contents_
    for (int i = 1; i < nArrs; i++) {
      Assert.assertFalse(internedArrs[i-1] == internedArrs[i]);
    }

    for (int i = 0; i < nArrs; i++) {
      internedArrs[i] = BYTE_ARWR_INTERNER.intern(new 
ByteArrayWrapper(sameContentArrs[i])).bytes;
    }

    for (int i = 1; i < nArrs; i++) {
      Assert.assertTrue(internedArrs[i-1] == internedArrs[i]);
    }
  }

  /**
   * A wrapper class, whose only purpose is to effectively override the wrapped
   * array's equals() method, so that we compare the contents of two arrays.
   * The default implementation of equals() in an array performs just an 
identity
   * check.
   *
   * If arrays are big and/or CPU performance of this code is critical, one may
   * consider caching the hashcode of the wrapped array after hashCode() is 
called
   * for the first time.
   */
  private static class ByteArrayWrapper {
    private final byte[] bytes;

    ByteArrayWrapper(byte[] bytes) {
      this.bytes = bytes;
    }

    @Override
    public int hashCode() {
      return Arrays.hashCode(bytes);
    }

    @Override
    public boolean equals(Object other) {
      if (!(other instanceof ByteArrayWrapper)) return false;
      return Arrays.equals(bytes, ((ByteArrayWrapper) other).bytes);
    }
  }
}{code}

> Reduce heap waste by reducing duplicate byte[] count
> ----------------------------------------------------
>
>                 Key: OOZIE-3250
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3250
>             Project: Oozie
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 5.0.0
>            Reporter: Andras Piros
>            Assignee: Andras Piros
>            Priority: Major
>         Attachments: OOZIE-3250.001.patch
>
>
> Similar to OOZIE-3232, we also need to intern the {{byte[]}} field values 
> within 
> [*{{BinaryBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/BinaryBlob.java#L32-L33]
>  and 
> [*{{StringBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/StringBlob.java#L34]
>  to reduce heap waste caused by duplicate {{byte[]}} entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to