lz19970205 commented on issue #10776: URL: https://github.com/apache/arrow/issues/10776#issuecomment-885354992
> List arrays and string arrays cannot have more than 2GB. This is because they are represented as two arrays. A values array and an offsets array. > > ``` > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 > Values: s t r i n g 1 s t r i n g 2 > Offsets: 0, 7, 14 > ``` > > The offsets point to the beginning (and end) of each string. Since the offsets array is int32 the maximum offset is 2GB and so the values array cannot have more than 2GB bytes of values. > > Normally, when this limit is hit, a good workaround is to split your data into smaller record batches (you can still represent it as a single table) but it will depend on what you are trying to do. I see. So you mean there is a huge array in my data? But I have already converted all array types to string types and removed the big string I thought. I verified the maximum length of all string columns in my data and I found that the maximum length is only 36000. This is far from reaching the 2GB limit. I will split data into smaller piece and try again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org