Hi all, I've completed the initial composable futures API and iterator work. The CSV reader portion is still WIP.
First, I'm interested in getting any feedback on the futures API. In particular Future<T>.Then in future.h (and the type erased Composable.Compose). The actual implementation can probably be cleaned up with regards to DRY (the 10 specializations of the Continue function) which I plan to do at the end. This approach is a little different than my earlier prototype. In the prototype it would always submit continuations on the thread pool as new tasks. Instead I've changed it so continuations will run synchronously when the future is marked complete. If there is a desire to move the continuation into a thread pool task it can be done with Executor.Transfer. As an example usage this is done in AsyncForEachHelper so that the applied for-each function is not run on the reader thread. Second, and perhaps what I'm more interested in, I've switched ThreadedTaskGroup to using futures (e.g. using Compose to add a callback that calls OneTaskDone instead of making a wrapper lambda function). In theory this should be more or less the exact same work as the previous task group implementation. However, I am seeing noticeable overhead in arrow-thread-pool-benchmark for small tasks. The benchmark runs on my system at ~950k items/s for no task group, ~890k items/s with the old task group implementation, and ~450k items/s with the futures based implementation. The change is isolated to one method in task_group.cc so if you replace the method at line 102 with the commented out version at line 127 the original performance returns. I've verified that the task is not getting copied. There are a few extra moves and function calls and futures have to be created and copied around so it is possible that is the cause of it but I'm curious if a second eye could see some other cause for the degradation that I am missing. I'll also be seeing if I can get gprof running later in hopes that can provide some insight. However, I probably won't spend too much more time on it before finishing up the CSV reader work and checking the performance of the CSV reader. If I can't figure out the cause of the performance I can always allow task group to keep the implementation it has for Append(task) while using future for Append(future). I suspect that the CSV reader tasks are long enough tasks that the overhead won't be an issue. Code: https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1 -Weston