[
https://issues.apache.org/jira/browse/ARROW-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208238#comment-17208238
]
Samarth Jain commented on ARROW-10153:
--------------------------------------
Ah! Thanks, [[email protected]] ! Looks like this was recently added.
Are there are any perf implications of using LargeVarCharVector by default?
Alternatively, is there a way to detect that a regular VarCharVector has run
out of capacity and that we need to copy over contents from a VarCharVector to
LargeVarCharVector.
[~bryanc], [~liyafan] - maybe one of you know?
> [Java] Adding values to VarCharVector beyond 2GB results in
> IndexOutOfBoundsException
> -------------------------------------------------------------------------------------
>
> Key: ARROW-10153
> URL: https://issues.apache.org/jira/browse/ARROW-10153
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java
> Affects Versions: 1.0.0
> Reporter: Samarth Jain
> Priority: Major
>
> On executing the below test case, one can see that on adding the 2049th
> string of size 1MB, it fails.
> {code:java}
> int length = 1024 * 1024;
> StringBuilder sb = new StringBuilder(length);
> for (int i = 0; i < length; i++) {
> sb.append("a");
> }
> byte[] str = sb.toString().getBytes();
> VarCharVector vector = new VarCharVector("v", new
> RootAllocator(Long.MAX_VALUE));
> vector.allocateNew(3000);
> for (int i = 0; i < 3000; i++) {
> vector.setSafe(i, str);
> }{code}
>
> {code:java}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index:
> -2147483648, length: 1048576 (expected: range(0, 2147483648))Exception in
> thread "main" java.lang.IndexOutOfBoundsException: index: -2147483648,
> length: 1048576 (expected: range(0, 2147483648)) at
> org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:699) at
> org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:762) at
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1212)
> at
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1011)
> {code}
> Stepping through the code,
>
> [https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/ArrowBuf.java#L425]
> returns the negative index `-2147483648`
--
This message was sent by Atlassian Jira
(v8.3.4#803005)