[GitHub] [arrow] leprechaunt33 commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

via GitHub Fri, 24 Feb 2023 05:46:45 -0800


leprechaunt33 commented on issue #33049:
URL: https://github.com/apache/arrow/issues/33049#issuecomment-1443703100


   Current work around I've developed for vaex in general with this pyarrow 
related error on dataframes for which the technique mentioned above does not 
work (for materialisation of pandas array from a joined multi file data frame 
where I was unable to set the arrow data type on the column):
   - catch the ArrowInvalid exception, create blank pandas data frame with 
required columns and iterate the columns in the vaex data frame to materialize 
them one by one within the pandas df.  
   - If ArrowInvalid is caught again, evaluate the rogue column with an 
evaluate_iterator() with prefetch and suitable chunk_size that will not exceed 
the bounds of the string, working off maximum expected record size, and collate 
the pyarrow StringArray/ChunkedArray data 
   - Continue iterating columns, typically only one or two columns will need 
the chunked treatment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] leprechaunt33 commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

Reply via email to