leprechaunt33 commented on issue #33049: URL: https://github.com/apache/arrow/issues/33049#issuecomment-1458257349
@Ben-Epstein thanks for the heads up. I'm still digesting the code. In my use case this workaround had been successful but only fully after ensuring the iterator was always chunked.  Not the most efficient, but... the only reason I mention this is: The bug for me arose with a single string column of a multiple hdf5 (33 files) dataset of max length 32000 (but in reality much shorter). I observed the iterator above fail when attempting to combine the chunks in the single less than 100 record filtered dataset. Given the iterator is on a filtered data set which had had a df.take done on unfiltered indices, I wonder if there might be more going on than just a take issue (unless that memory explosion is really that bad). I'll have a closer look at where mine was failing some time this week and see if those changes fix things. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
