[
https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022602#comment-17022602
]
Viraj Jasani edited comment on HBASE-23279 at 1/24/20 12:17 AM:
----------------------------------------------------------------
We have this function in ByteBufferUtils:
{code:java}
public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int
sourceOffset,
int destinationOffset, int length) {
if (in.hasArray()) {
System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out,
destinationOffset, length);
} else if (UNSAFE_AVAIL) {
UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
} else {
ByteBuffer inDup = in.duplicate();
inDup.position(sourceOffset);
inDup.get(out, destinationOffset, length);
}
}
{code}
Which is used while copying the content from this ByteBuff's current position
to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while
copying keys and values.
While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present
for ROW_INDEX_V1:
NONE:
{code:java}
out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408
{code}
ROW_INDEX_V1:
{code:java}
out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752
{code}
For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length.
It seems avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it is
458752.
I tried updating some encoding based conditions in HFileBlock.unpack() but no
luck.
Exception stacktrace:
{code:java}
java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method) at
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151)
at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at
org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255)
at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109)
at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code}
[~stack] [~ram_krish] We have many HFile write tests right? Is this the only
test that directly deals with ByteBuff interface? This is the only test failure.
was (Author: vjasani):
We have this function in ByteBufferUtils:
{code:java}
public static void copyFromBufferToArray(byte[] out, ByteBuffer in, int
sourceOffset,
int destinationOffset, int length) {
if (in.hasArray()) {
System.arraycopy(in.array(), sourceOffset + in.arrayOffset(), out,
destinationOffset, length);
} else if (UNSAFE_AVAIL) {
UnsafeAccess.copy(in, sourceOffset, out, destinationOffset, length);
} else {
ByteBuffer inDup = in.duplicate();
inDup.position(sourceOffset);
inDup.get(out, destinationOffset, length);
}
}
{code}
Which is used while copying the content from this ByteBuff's current position
to the byte array all the way from TestHFileWriterV3 -> ByteBufferUtils while
copying keys and values.
While using NONE vs ROW_INDEX_V1, the highest "byte[] out" length is present
for ROW_INDEX_V1:
NONE:
{code:java}
out.len: 32768 sourceOffset: 0 destinationOffset: 0 length: 2408
{code}
ROW_INDEX_V1:
{code:java}
out.len: 458752 sourceOffset: 8 destinationOffset: 0 length: 458752
{code}
For ROW_INDEX_V1, it gives ArrayIndexOutOfBoundsExceptions with 458752 length.
It seems like avg keyLen for NONE encoding is ~60 whereas for ROW_INDEX_V1 it
is 458752.
I tried updating some encoding based conditions in HFileBlock.unpack() but no
luck.
Exception stacktrace:
{code:java}
java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method) at
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray(ByteBufferUtils.java:1151)
at org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:216) at
org.apache.hadoop.hbase.nio.SingleByteBuff.get(SingleByteBuff.java:228) at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.writeDataAndReadFromHFile(TestHFileWriterV3.java:255)
at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3Internals(TestHFileWriterV3.java:109)
at
org.apache.hadoop.hbase.io.hfile.TestHFileWriterV3.testHFileFormatV3(TestHFileWriterV3.java:102){code}
[~stack] [~ram_krish]
> Switch default block encoding to ROW_INDEX_V1
> ---------------------------------------------
>
> Key: HBASE-23279
> URL: https://issues.apache.org/jira/browse/HBASE-23279
> Project: HBase
> Issue Type: Wish
> Affects Versions: 3.0.0, 2.3.0
> Reporter: Lars Hofhansl
> Assignee: Viraj Jasani
> Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-23279.master.000.patch,
> HBASE-23279.master.001.patch, HBASE-23279.master.002.patch,
> HBASE-23279.master.003.patch, HBASE-23279.master.004.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles
> are slightly larger about 3% or so). I think that would a better default than
> NONE.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)