[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

ASF GitHub Bot (Jira) Tue, 22 Jun 2021 07:29:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25190?focusedWorklogId=613518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-613518
 ]


ASF GitHub Bot logged work on HIVE-25190:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Jun/21 14:28
            Start Date: 22/Jun/21 14:28
    Worklog Time Spent: 10m 
      Work Description: pgaref commented on a change in pull request #2408:
URL: https://github.com/apache/hive/pull/2408#discussion_r656277121



##########
File path: 
storage-api/src/test/org/apache/hadoop/hive/ql/exec/vector/TestBytesColumnVector.java
##########
@@ -74,16 +88,97 @@ public void testSmallBufferReuse() {
 
     // All small writes to the first buffer should be in contiguous memory
     for (int i = 0; i < bytesWrittenToBytes1; ++i) {
-      assertEquals((byte) 1, bytes1[i]);
+      assertEquals((byte) (i / smallWriteSize + 1), bytes1[i]);
+    }
+  }
+
+  /**
+   * Test the setVal, setConcat, and StringExpr.padRight methods.
+   */
+  @Test
+  public void testConcatAndPadding() {
+    BytesColumnVector col = new BytesColumnVector();
+    col.reset();
+    byte[] prefix = "緑".getBytes(StandardCharsets.UTF_8);
+
+    // fill the column with 'test'
+    for(int row=0; row < col.vector.length; ++row) {
+      col.setVal(row, prefix, 0, prefix.length);
+    }
+    for(int row=0; row < col.vector.length; ++row) {
+      assertEquals("row " + row, "緑", col.toString(row));
+    }
+
+    // pad out to 6 characters
+    for(int row=0; row < col.vector.length; ++row) {
+      StringExpr.padRight(col, row, col.vector[row], col.start[row],
+          col.length[row], 6);
+    }
+    for(int row=0; row < col.vector.length; ++row) {
+      assertEquals("row " + row, "緑     ", col.toString(row));
+    }
+
+    // concat the row digits
+    for(int row=0; row < col.vector.length; ++row) {
+      byte[] rowStr = Integer.toString(row).getBytes(StandardCharsets.UTF_8);
+      col.setConcat(row, col.vector[row], col.start[row], col.length[row],
+          rowStr, 0, rowStr.length);
+    }
+    for(int row=0; row < col.vector.length; ++row) {
+      assertEquals("row " + row, "緑     " + row, col.toString(row));
+    }
+
+    // We end up allocating 20k, so we should have expanded the small buffer
+    assertEquals(32 * 1024, col.bufferSize());
+  }
+
+  @Test
+  public void testBufferOverflow() {
+    BytesColumnVector col = new BytesColumnVector(2048);
+    col.reset();
+    assertEquals(BytesColumnVector.DEFAULT_BUFFER_SIZE, col.bufferSize());
+
+    // pick a size below 1m so that we use the small buffer;
+    final int size = BytesColumnVector.MAX_SIZE_FOR_SMALL_ITEM - 1024;
+
+    // run through once to expand the small value buffer
+    for(int row=0; row < col.vector.length; ++row) {
+      writeToBytesColumnVector(row, col, size, row);
+    }
+    // it should have resized a bunch of times
+    byte[] smallBuffer = col.getValPreallocatedBytes();
+    assertNotSame(smallBuffer, col.vector[0]);
+    assertSame(smallBuffer, col.vector[1024]);
+
+    // reset the column, but make sure the buffer isn't reallocated
+    col.reset();
+    assertEquals(BytesColumnVector.MAX_SIZE_FOR_SMALL_BUFFER, 
col.bufferSize());
+
+    // fill up the vector now with the large buffer
+    for(int row=0; row < col.vector.length; ++row) {
+      writeToBytesColumnVector(row, col, size, row);
+    }
+    assertEquals(BytesColumnVector.MAX_SIZE_FOR_SMALL_BUFFER, 
col.bufferSize());
+    // now the first 1025 rows should all be the small buffer
+    for(int row=0; row < 1025; ++row) {
+      assertSame("row " + row, smallBuffer, col.vector[row]);
+      assertEquals("row " + row, row * size, col.start[row]);
+      assertEquals("row " + row, size, col.length[row]);
+    }
+    // the rest should be custom buffers
+    for(int row=1025; row < col.vector.length; ++row) {
+      assertNotSame("row " + row, smallBuffer, col.vector[row]);
+      assertEquals("row " + row, 0, col.start[row]);
+      assertEquals("row " + row, size, col.length[row]);
     }
   }
 
-  // Write a value to the column vector, and return back the byte buffer used.
-  private static byte[] writeToBytesColumnVector(int rowIdx, BytesColumnVector 
col, int writeSize, byte val) {
+    // Write a value to the column vector, and return back the byte buffer 
used.

Review comment:
       space not needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 613518)
    Time Spent: 40m  (was: 0.5h)

> BytesColumnVector fails when the aggregate size is > 1gb
> --------------------------------------------------------
>
>                 Key: HIVE-25190
>                 URL: https://issues.apache.org/jira/browse/HIVE-25190
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, BytesColumnVector will allocate a buffer for small values (< 1mb), 
> but fail with:
> {code:java}
> new RuntimeException("Overflow of newLength. smallBuffer.length="
>                 + smallBuffer.length + ", nextElemLength=" + nextElemLength);
> {code:java}
> if the aggregate size of the buffer crosses over 1gb. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-25190) BytesColumnVector fails when the aggregate size is > 1gb

Reply via email to