[ 
https://issues.apache.org/jira/browse/ARROW-10153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208238#comment-17208238
 ] 

Samarth Jain commented on ARROW-10153:
--------------------------------------

Ah! Thanks, [~emkornfi...@gmail.com] ! Looks like this was recently added.

 

Are there are any perf implications of using LargeVarCharVector by default? 
Alternatively, is there a way to detect that a regular VarCharVector has run 
out of capacity and that we need to copy over contents from a VarCharVector to 
LargeVarCharVector. 

[~bryanc], [~liyafan] - maybe one of you know? 

> [Java] Adding values to VarCharVector beyond 2GB results in 
> IndexOutOfBoundsException
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-10153
>                 URL: https://issues.apache.org/jira/browse/ARROW-10153
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 1.0.0
>            Reporter: Samarth Jain
>            Priority: Major
>
> On executing the below test case, one can see that on adding the 2049th 
> string of size 1MB, it fails.  
> {code:java}
> int length = 1024 * 1024;
> StringBuilder sb = new StringBuilder(length);
> for (int i = 0; i < length; i++) {
>  sb.append("a");
> }
> byte[] str = sb.toString().getBytes();
> VarCharVector vector = new VarCharVector("v", new 
> RootAllocator(Long.MAX_VALUE));
> vector.allocateNew(3000);
> for (int i = 0; i < 3000; i++) {
>  vector.setSafe(i, str);
> }{code}
>  
> {code:java}
> Exception in thread "main" java.lang.IndexOutOfBoundsException: index: 
> -2147483648, length: 1048576 (expected: range(0, 2147483648))Exception in 
> thread "main" java.lang.IndexOutOfBoundsException: index: -2147483648, 
> length: 1048576 (expected: range(0, 2147483648)) at 
> org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:699) at 
> org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:762) at 
> org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1212)
>  at 
> org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1011)
> {code}
> Stepping through the code, 
>  
> [https://github.com/apache/arrow/blob/master/java/memory/memory-core/src/main/java/org/apache/arrow/memory/ArrowBuf.java#L425]
> returns the negative index `-2147483648`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to