westonpace commented on pull request #11145: URL: https://github.com/apache/arrow/pull/11145#issuecomment-920204721
I think my approval still stands. > This fragment, when scanned, unconditionally uses a BackgroundIterator and transfers to the CPU executor. Hence under stress, this could cause both the main thread and CPU threads to be used. I think this needs to be fixed I'm not sure this is a bad thing. The rules for scanners are pretty clear. A ScanOptions has an I/O executor which fragments are welcome to use and, if they do so, they should transfer back to the CPU thread pool. So OneShotFragment is following the rules here. The fact that it causes an indexing error is a consequence of ARROW-13741. If that latter issue were resolved then it shouldn't matter at all what the scanner is doing. I think the only thing that could be changed is adding a CPU executor to scan options instead of threads grabbing the internal CPU thread pool. I think right now we hack around this with "SafeExecute". > Furthermore I think it would be useful if, during a non-parallel scan, we could abort or otherwise enforce that the CPU thread pool is not being used. A "non-parallel" scan means that the CPU and I/O executors are a serial executor. So the internal code shouldn't change either way and there should be no possible way to use the CPU thread. If we put the CPU executor into the scan options then it might be good to catch places we do internal::GetCpuThreadPool instead of using the options executor. > Actually, given the talks about tracing, it would be interesting if we could somehow capture executor usage in the traces - particularly spawning thread. Agreed. Thread pool tasks should be a "span" and the name of the thread pool (also, thread pools should be named), and possibly a worker index/id, should be properties of the span. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
