leprechaunt33 commented on issue #33049: URL: https://github.com/apache/arrow/issues/33049#issuecomment-1443703100
Current work around I've developed for vaex in general with this pyarrow related error on dataframes for which the technique mentioned above does not work (for materialisation of pandas array from a joined multi file data frame where I was unable to set the arrow data type on the column): - catch the ArrowInvalid exception, create blank pandas data frame with required columns and iterate the columns in the vaex data frame to materialize them one by one within the pandas df. - If ArrowInvalid is caught again, evaluate the rogue column with an evaluate_iterator() with prefetch and suitable chunk_size that will not exceed the bounds of the string, working off maximum expected record size, and collate the pyarrow StringArray/ChunkedArray data - Continue iterating columns, typically only one or two columns will need the chunked treatment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
