[
https://issues.apache.org/jira/browse/OOZIE-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471018#comment-16471018
]
Misha Dmitriev commented on OOZIE-3250:
---------------------------------------
Thank you for making this change, [~andras.piros]!
I was on the verge of approving it when I realized that attempting to intern
arrays in the same way as non-array objects may not work as expected. That's
because the equals() method in arrays compares two arrays with a simple '==',
rather than by contents. So you have to use some more code to achieve what you
want. See the test below that I wrote, it's self-explanatory enough.
{code:java}
import com.google.common.collect.Interner;
import com.google.common.collect.Interners;
import org.junit.Assert;
import org.junit.Test;
import java.util.Arrays;
public class TestByteArrayInterning {
private static final Interner<byte[]> BYTE_ARRAY_INTERNER =
Interners.newWeakInterner();
private static final Interner<ByteArrayWrapper> BYTE_ARWR_INTERNER =
Interners.newWeakInterner();
@Test
public void testByteArrayInterning() {
int nArrs = 1000;
int nEls = 100;
byte[][] sameContentArrs = new byte[nArrs][];
for (int i = 0; i < nArrs; i++) {
byte[] b = new byte[nEls];
sameContentArrs[i] = b;
for (int j = 0; j < nEls; j++) b[j] = (byte) j;
}
byte[][] internedArrs = new byte[nArrs][];
for (int i = 0; i < nArrs; i++) {
internedArrs[i] = BYTE_ARRAY_INTERNER.intern(sameContentArrs[i]);
}
// Trying to intern byte[] arrays directly doesn't work, because in the
implemenation
// of arrays the equals() method is the same as '=='. It doesn't compare
arrays' _contents_
for (int i = 1; i < nArrs; i++) {
Assert.assertFalse(internedArrs[i-1] == internedArrs[i]);
}
for (int i = 0; i < nArrs; i++) {
internedArrs[i] = BYTE_ARWR_INTERNER.intern(new
ByteArrayWrapper(sameContentArrs[i])).bytes;
}
for (int i = 1; i < nArrs; i++) {
Assert.assertTrue(internedArrs[i-1] == internedArrs[i]);
}
}
/**
* A wrapper class, whose only purpose is to effectively override the wrapped
* array's equals() method, so that we compare the contents of two arrays.
* The default implementation of equals() in an array performs just an
identity
* check.
*
* If arrays are big and/or CPU performance of this code is critical, one may
* consider caching the hashcode of the wrapped array after hashCode() is
called
* for the first time.
*/
private static class ByteArrayWrapper {
private final byte[] bytes;
ByteArrayWrapper(byte[] bytes) {
this.bytes = bytes;
}
@Override
public int hashCode() {
return Arrays.hashCode(bytes);
}
@Override
public boolean equals(Object other) {
if (!(other instanceof ByteArrayWrapper)) return false;
return Arrays.equals(bytes, ((ByteArrayWrapper) other).bytes);
}
}
}{code}
> Reduce heap waste by reducing duplicate byte[] count
> ----------------------------------------------------
>
> Key: OOZIE-3250
> URL: https://issues.apache.org/jira/browse/OOZIE-3250
> Project: Oozie
> Issue Type: Improvement
> Components: core
> Affects Versions: 5.0.0
> Reporter: Andras Piros
> Assignee: Andras Piros
> Priority: Major
> Attachments: OOZIE-3250.001.patch
>
>
> Similar to OOZIE-3232, we also need to intern the {{byte[]}} field values
> within
> [*{{BinaryBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/BinaryBlob.java#L32-L33]
> and
> [*{{StringBlob}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/StringBlob.java#L34]
> to reduce heap waste caused by duplicate {{byte[]}} entries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)