Allocating batches of tiles to each thread and dynamically re-allocating
them should a thread finish early will complicate the code and will only
provide very marginal speed improvement.  (Each thread will have to obtain a
lock on a work queue to get its next tile either way but the proposed scheme
has a slightly less chance of lock contention since there are effectively
multiple queues).

As far as task sequencing goes that all depends on how tile memory is
allocated.  To maximize locality of reference and minimize paging the goal
should be to have all threads working on contiguous areas of memory at the
same time where possible.


-Dave

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tino
Schwarze
Sent: Monday, 21 February 2005 6:14 PM
To: gimp-developer@lists.xcf.berkeley.edu
Subject: Re: [Gimp-developer] GIMP and multiple processors

On Sun, Feb 20, 2005 at 11:52:16PM +0000, Adam D. Moss wrote:

> >I can force it to use both CPUs now, but even with
> >200% utilization it is 2s slower to run this stupid
> >ubenchmark than on 1 CPU without threads.
> 
> Just a vague guess, but the multiprocessor GIMP pixel
> work scheduler might* farm alternating tiles to alternating
> CPUs.  These are reasonably likely to have been allocated
> together and thus sit close together in memory, causing
> memory contention between CPUs.

So it might make sense not to interleave tiles but let each thread start
on another region of the image, like this (for 4 threads)

1xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
2xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
3xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx
4xxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx

So each of the threads works on "it's own" memory at first and only when
it finished it's region, it will get tiles from the remaining work. Some
binary-search like algorithm could be used like this:

- figure out largest remaining part of the image (at start, it's the
whole image)
- let x be the number of idle threads
- if starting up, 
      divide into x regions
      assign each thread the start of it's region
   else 
      divide into x+1 regions
      assign idle threads the x+1st, x-th, x-1st etc. region (this
        leaves the first to the thread currently processing it)
- start processing the regions
- if any thread runs out of work, restart algorithm

This should improve memory locality while distributing work across all
threads and making sure that non-uniform workloads are also distributed.

Consider the above example where thread 4 has finished it's tiles:

1111xxxxxxxxxxxx
xxxxxxxxxxxxxxxx
22222222xxxxxxxx
xxxxxxxxxxxxxxxx
333333333333333x
xxxxxxxxxxxxxxxx
4444444444444444
4444444444444444

It looks like 1 will need the longest time to finish it's region, so we
jump in there and assign part of it's work to 4 (marked y):

1111xxxxxxxxxxxx
xx4yyyyyyyyyyyyy
22222222xxxxxxxx
xxxxxxxxxxxxxxxx
333333333333333x
xxxxxxxxxxxxxxxx
4444444444444444
4444444444444444

Later, 3 finishes:

11111111xxxxxxxx
xx44444yyyyyyyyy
2222222222222222
xxxxxxxxxxxxxxxx
3333333333333333
3333333333333333
4444444444444444
4444444444444444

Now, 2 has the largest region  left, so half of it get's assign to 3
(marked z):

11111111xxxxxxxx
xx44444yyyyyyyyy
2222222222222222
xxxxxxxx3zzzzzzz
3333333333333333
3333333333333333
4444444444444444
4444444444444444

The state of affairs can be managed easily since there are never more
regions than threads.

HTH! Tino.

PS: I don't know whether this algorithm works well in practice, I
actually got the idea while replying Adam's response.

_______________________________________________
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

_______________________________________________
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Reply via email to