[GitHub] [arrow] leprechaunt33 commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

via GitHub Tue, 07 Mar 2023 06:20:38 -0800


leprechaunt33 commented on issue #33049:
URL: https://github.com/apache/arrow/issues/33049#issuecomment-1458257349


   @Ben-Epstein thanks for the heads up.   I'm still digesting the code. In my 
use case this workaround had been successful but only fully after ensuring the 
iterator was always chunked.  
   
![image](https://user-images.githubusercontent.com/121342696/223433425-4f161fb7-0d6a-471c-892c-118d9ca67c18.png)
   Not the most efficient, but... the only reason I mention this is:
   The bug for me arose with a single string column of a multiple hdf5 (33 
files) dataset of max length 32000 (but in reality much shorter).  I observed 
the iterator above fail when attempting to combine the chunks in the single 
less than 100 record filtered dataset. Given the iterator is on a filtered data 
set which had had a df.take done on unfiltered indices, I wonder if there might 
be more going on than just a take issue (unless that memory explosion is really 
that bad).  I'll have a closer look at where mine was failing some time this 
week and see if those changes fix things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] leprechaunt33 commented on issue #33049: [C++][Python] Large strings cause ArrowInvalid: offset overflow while concatenating arrays

Reply via email to