Hi Weston,

Le 25/09/2020 à 23:21, Weston Pace a écrit :
> 
> * The current thread pool implementation deadlocks when used in a
> "nested" case, an asynchronous solution can work around this

If required it may be possible to hack around this.  For example, AFAIR
TBB has a simple heuristic to enable reentrant calls into the thread
pool until a hardcoded recursion level.

> The asynchronous reader performed about 10% less
> than the multithreaded reader and about 10% better than the serial
> reader.  

A 10% slowdown doesn't seem dramatic, but I agree the causes should be
investigated.  Also, once you have a more polished implementation, other
people can try to benchmark it on larger machines.

As a first intuition, I'd say that processing 20 CSV files at once on a
2-core machine will probably not be very CPU cache-friendly.  Parsing
and converting CSV is CPU-intensive.

> If people think there is merit in this kind of approach I'd be happy
> to clean up my continuations API (simplify the API, reduce excess
> copies, more automated tests) and investigate further why the
> asynchronous reader is slower than the threaded read and, if I can fix
> those issues, potentially add it as the default threaded reader. 

I think the continuations API can have merit regardless of the exact
benchmark results.  If the tasks are sized properly, the overhead should
ideally be negligible.

> I'd guess it would take about two months for me to find enough free
> time blocks to finish all this work.

If you are able to split this into smaller chunks of work, other people
may be able to help at some point.  Also, feel free to open the required
JIRA issues (you can create sub-tasks in JIRA).

Regards

Antoine.

Reply via email to