anjakefala commented on issue #40301: URL: https://github.com/apache/arrow/issues/40301#issuecomment-1974015365
@felipecrv Is modifying `ThreadPool` better than an option where we use an approach similar to [the SplitBlockCreator class](https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L2422) for tables under a certain size? That's more along the line of what I was thinking of. However, if you think `work-stealing` would be the most robust solution, that other functions would benefit from, I'd be game for approaching this. I prefer the work-stealing approach because, ideally, we wouldn't require the user to know about the existence of an option to set. Folks might not know that the memory usage has to do with the spawning of individual threads. They might not even know why `to_pandas` spawns multiple threads. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
