Re: [GENERAL] Are there any options to parallelize queries?

Seref Arikan Wed, 22 Aug 2012 03:11:59 -0700

Craid and Pavel: thanks to you both for the responses.

Craig, this is for my PhD work, so no commercial interest at this point.
However, I'm pushing very hard at various communities for funding/support
for a Postgres based implementation of an EHR repository, that'll hopefully
benefit from my PhD efforts. I'll certainly add the option of funding some
key work into those discussions, which actually fits the model that we've
been discussing at the university for some time very well.


Kind regards
Seref


On Wed, Aug 22, 2012 at 4:24 AM, Craig Ringer <ring...@ringerc.id.au> wrote:

> On 08/21/2012 04:45 PM, Seref Arikan wrote:
>
>  Parallel software frameworks such as Erlang's OTP or Scala's Akka do
>> help a lot, but it would be a lot better if I could feed those
>> frameworks with data faster. So, what options do I have to execute
>> queries in parallel, assuming a transactional system running on
>> postgresql?
>>
>
> AFAIK Native support for parallelisation of query execution is currently
> almost non-existent in Pg. You generally have to break your queries up into
> smaller queries that do part of the work, run them in parallel, and
> integrate the results together client-side.
>
> There are some tools that can help with this. For example, I think
> PgPool-II has some parallelisation features, though I've never used them.
> Discussion I've seen on this list suggests that many people handle it in
> their code directly.
>
> Note that Pg is *very* good at concurently running many queries, with
> features like synchronized scans. The whole DB is written around fast
> concurrent execution of queries, and it'll happily use every CPU and I/O
> resource you have. However, individual queries cannot use multiple CPUs or
> I/O "threads", you need many queries running in parallel to use the
> hardware's resources fully.
>
>
> As far as I know the only native in-query parallelisation Pg offers is via
> effective_io_concurrency, and currently that only affects bitmap heap scans:
>
>     
> http://archives.postgresql.**org/pgsql-general/2009-10/**msg00671.php<http://archives.postgresql.org/pgsql-general/2009-10/msg00671.php>
>
> ... not seqscans or other access methods.
>
> Execution of each query is done with a single process running a single
> thread, so there's no CPU parallelism except where the compiler can
> introduce some behind the scenes - which isn't much. I/O isn't parallelised
> across invocations of nested loops, by splitting seqscans up into chunks,
> etc either.
>
> There are some upsides to this limitation, though:
>
> - The Pg code is easier to understand, maintain, and fix
>
> - It's easier to add features
>
> - It's easier to get right, so it's less buggy and more
>   reliable.
>
>
> As the world goes more and more parallel Pg is likely to follow at some
> point, but it's going to be a mammoth job. I don't see anyone volunteering
> the many months of their free time required, there's nobody being funded to
> work on it, and I don't see any of the commercial Pg forks that've added
> parallel features trying to merge their work back into mainline.
>
> If you have a commercial need, perhaps you can find time to fund work on
> something that'd help you out, like honouring effective_io_concurrency for
> sequential scans?
>
> --
> Craig Ringer
>

Re: [GENERAL] Are there any options to parallelize queries?

Reply via email to