lz19970205 commented on issue #10776:
URL: https://github.com/apache/arrow/issues/10776#issuecomment-885354992


   > List arrays and string arrays cannot have more than 2GB. This is because 
they are represented as two arrays. A values array and an offsets array.
   > 
   > ```
   >         0  1  2  3  4  5  6  7  8  9  10 11 12 13       
   > Values: s  t  r  i  n  g  1  s  t  r  i  n  g  2
   > Offsets: 0, 7, 14
   > ```
   > 
   > The offsets point to the beginning (and end) of each string. Since the 
offsets array is int32 the maximum offset is 2GB and so the values array cannot 
have more than 2GB bytes of values.
   > 
   > Normally, when this limit is hit, a good workaround is to split your data 
into smaller record batches (you can still represent it as a single table) but 
it will depend on what you are trying to do.
   
   I see. So you mean there is a huge array in my data?
   But I have already converted all array types to string types and removed the 
big string I thought.
   I verified the maximum length of all string columns in my data and I found 
that the maximum length is only 36000. This is far from reaching the 2GB limit.
   
   I will split data into smaller piece and try again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to