Hi Weston,

Note: I'm on vacation, so won't be able to look at this before ~2 weeks.

For information, there's a micro-benchmark of thread pools and task
groups in src/arrow/util/thread_pool_benchmark.cc.  It should allow you
isolate performance concerns a bit better.

Regards

Antoine.


Le 26/10/2020 à 16:48, Weston Pace a écrit :
> Hi all,
> 
> I've completed the initial composable futures API and iterator work.
> The CSV reader portion is still WIP.
> 
> First, I'm interested in getting any feedback on the futures API.  In
> particular Future<T>.Then in future.h (and the type erased
> Composable.Compose).  The actual implementation can probably be
> cleaned up with regards to DRY (the 10 specializations of the Continue
> function) which I plan to do at the end.
> 
> This approach is a little different than my earlier prototype.  In the
> prototype it would always submit continuations on the thread pool as
> new tasks.  Instead I've changed it so continuations will run
> synchronously when the future is marked complete.  If there is a
> desire to move the continuation into a thread pool task it can be done
> with Executor.Transfer.  As an example usage this is done in
> AsyncForEachHelper so that the applied for-each function is not run on
> the reader thread.
> 
> Second, and perhaps what I'm more interested in, I've switched
> ThreadedTaskGroup to using futures (e.g. using Compose to add a
> callback that calls OneTaskDone instead of making a wrapper lambda
> function).  In theory this should be more or less the exact same work
> as the previous task group implementation.  However, I am seeing
> noticeable overhead in arrow-thread-pool-benchmark for small tasks.
> The benchmark runs on my system at ~950k items/s for no task group,
> ~890k items/s with the old task group implementation, and ~450k
> items/s with the futures based implementation.  The change is isolated
> to one method in task_group.cc so if you replace the method at line
> 102 with the commented out version at line 127 the original
> performance returns.  I've verified that the task is not getting
> copied.  There are a few extra moves and function calls and futures
> have to be created and copied around so it is possible that is the
> cause of it but I'm curious if a second eye could see some other cause
> for the degradation that I am missing.  I'll also be seeing if I can
> get gprof running later in hopes that can provide some insight.
> However, I probably won't spend too much more time on it before
> finishing up the CSV reader work and checking the performance of the
> CSV reader.
> 
> If I can't figure out the cause of the performance I can always allow
> task group to keep the implementation it has for Append(task) while
> using future for Append(future).  I suspect that the CSV reader tasks
> are long enough tasks that the overhead won't be an issue.
> 
> Code: 
> https://github.com/apache/arrow/compare/master...westonpace:feature/arrow-10183?expand=1
> 
> -Weston
> 

Reply via email to