[ https://issues.apache.org/jira/browse/ARROW-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208238#comment-17208238 ]
Samarth Jain commented on ARROW-10153: -------------------------------------- Ah! Thanks, [~emkornfi...@gmail.com] ! Looks like this was recently added. Are there are any perf implications of using LargeVarCharVector by default? Alternatively, is there a way to detect that a regular VarCharVector has run out of capacity and that we need to copy over contents from a VarCharVector to LargeVarCharVector. [~bryanc], [~liyafan] - maybe one of you know? > [Java] Adding values to VarCharVector beyond 2GB results in > IndexOutOfBoundsException > ------------------------------------------------------------------------------------- > > Key: ARROW-10153 > URL: https://issues.apache.org/jira/browse/ARROW-10153 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Affects Versions: 1.0.0 > Reporter: Samarth Jain > Priority: Major > > On executing the below test case, one can see that on adding the 2049th > string of size 1MB, it fails. > {code:java} > int length = 1024 * 1024; > StringBuilder sb = new StringBuilder(length); > for (int i = 0; i < length; i++) { > sb.append("a"); > } > byte[] str = sb.toString().getBytes(); > VarCharVector vector = new VarCharVector("v", new > RootAllocator(Long.MAX_VALUE)); > vector.allocateNew(3000); > for (int i = 0; i < 3000; i++) { > vector.setSafe(i, str); > }{code} > > {code:java} > Exception in thread "main" java.lang.IndexOutOfBoundsException: index: > -2147483648, length: 1048576 (expected: range(0, 2147483648))Exception in > thread "main" java.lang.IndexOutOfBoundsException: index: -2147483648, > length: 1048576 (expected: range(0, 2147483648)) at > org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:699) at > org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:762) at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1212) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1011) > {code} > Stepping through the code, > > [https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/ArrowBuf.java#L425] > returns the negative index `-2147483648` -- This message was sent by Atlassian Jira (v8.3.4#803005)