Hi All,
I have been looking at how we currently use the scanner. Look like it
should be not too difficult to inject a parallel scanner instead of the
default serial scanner since in many use cases we don't care about the
ordering of the data retrieved.
Key question: do we sometime take advantage of the ordering (to do stuff
like merges) or are these merges requiring sorting are anyway always at the
ESP level?
The question is to know if we should have optional serial scanner or
parallel scanner (one with sorting preserved, the other not) or if we could
always enable parallel scanner?
On implementation details, we can do sophisticated algorithm to preserve
thread resources and auto scale the parallelism based on the speed of
consumption of the code doing next(), or we can simply always go with as
many thread as there is regions to scan, accepting the fact that some
thread will wait() if client next() code is not consuming fast enough.
I can prototype the simple one, then move to the auto scaling of thread
once done.
The reason I need to know if we should keep the serial scanner path is to
know if I should create a whole new wiring for parallel scanner, or if I
can just replace the serial scanner with the parallel one (just enabling
one or the other at config time just for bench-marking purpose).
Anybody working on this already, or should I give it a try?
Regards,
Eric