[c++] Futures API review & help understanding benchmark result

Weston Pace Mon, 26 Oct 2020 08:49:32 -0700

Hi all,

I've completed the initial composable futures API and iterator work.
The CSV reader portion is still WIP.


First, I'm interested in getting any feedback on the futures API.  In
particular Future<T>.Then in future.h (and the type erased
Composable.Compose).  The actual implementation can probably be
cleaned up with regards to DRY (the 10 specializations of the Continue
function) which I plan to do at the end.

This approach is a little different than my earlier prototype.  In the
prototype it would always submit continuations on the thread pool as
new tasks.  Instead I've changed it so continuations will run
synchronously when the future is marked complete.  If there is a
desire to move the continuation into a thread pool task it can be done
with Executor.Transfer.  As an example usage this is done in
AsyncForEachHelper so that the applied for-each function is not run on
the reader thread.

Second, and perhaps what I'm more interested in, I've switched
ThreadedTaskGroup to using futures (e.g. using Compose to add a
callback that calls OneTaskDone instead of making a wrapper lambda
function).  In theory this should be more or less the exact same work
as the previous task group implementation.  However, I am seeing
noticeable overhead in arrow-thread-pool-benchmark for small tasks.
The benchmark runs on my system at ~950k items/s for no task group,
~890k items/s with the old task group implementation, and ~450k
items/s with the futures based implementation.  The change is isolated
to one method in task_group.cc so if you replace the method at line
102 with the commented out version at line 127 the original
performance returns.  I've verified that the task is not getting
copied.  There are a few extra moves and function calls and futures
have to be created and copied around so it is possible that is the
cause of it but I'm curious if a second eye could see some other cause
for the degradation that I am missing.  I'll also be seeing if I can
get gprof running later in hopes that can provide some insight.
However, I probably won't spend too much more time on it before
finishing up the CSV reader work and checking the performance of the
CSV reader.

If I can't figure out the cause of the performance I can always allow
task group to keep the implementation it has for Append(task) while
using future for Append(future).  I suspect that the CSV reader tasks
are long enough tasks that the overhead won't be an issue.

Code: 
https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1

-Weston

[c++] Futures API review & help understanding benchmark result

Reply via email to