Hi Weston,
Note: I'm on vacation, so won't be able to look at this before ~2 weeks. For information, there's a micro-benchmark of thread pools and task groups in src/arrow/util/thread_pool_benchmark.cc. It should allow you isolate performance concerns a bit better. Regards Antoine. Le 26/10/2020 à 16:48, Weston Pace a écrit : > Hi all, > > I've completed the initial composable futures API and iterator work. > The CSV reader portion is still WIP. > > First, I'm interested in getting any feedback on the futures API. In > particular Future<T>.Then in future.h (and the type erased > Composable.Compose). The actual implementation can probably be > cleaned up with regards to DRY (the 10 specializations of the Continue > function) which I plan to do at the end. > > This approach is a little different than my earlier prototype. In the > prototype it would always submit continuations on the thread pool as > new tasks. Instead I've changed it so continuations will run > synchronously when the future is marked complete. If there is a > desire to move the continuation into a thread pool task it can be done > with Executor.Transfer. As an example usage this is done in > AsyncForEachHelper so that the applied for-each function is not run on > the reader thread. > > Second, and perhaps what I'm more interested in, I've switched > ThreadedTaskGroup to using futures (e.g. using Compose to add a > callback that calls OneTaskDone instead of making a wrapper lambda > function). In theory this should be more or less the exact same work > as the previous task group implementation. However, I am seeing > noticeable overhead in arrow-thread-pool-benchmark for small tasks. > The benchmark runs on my system at ~950k items/s for no task group, > ~890k items/s with the old task group implementation, and ~450k > items/s with the futures based implementation. The change is isolated > to one method in task_group.cc so if you replace the method at line > 102 with the commented out version at line 127 the original > performance returns. I've verified that the task is not getting > copied. There are a few extra moves and function calls and futures > have to be created and copied around so it is possible that is the > cause of it but I'm curious if a second eye could see some other cause > for the degradation that I am missing. I'll also be seeing if I can > get gprof running later in hopes that can provide some insight. > However, I probably won't spend too much more time on it before > finishing up the CSV reader work and checking the performance of the > CSV reader. > > If I can't figure out the cause of the performance I can always allow > task group to keep the implementation it has for Append(task) while > using future for Append(future). I suspect that the CSV reader tasks > are long enough tasks that the overhead won't be an issue. > > Code: > https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1 > > -Weston >