On Mon, 21 Feb 2005 02:00:39 +0100, Sven Neumann <[EMAIL PROTECTED]> wrote:
> > It sounds like the granularity of parallelism is too fine.  That is,
> > each "task" is too short and the overhead of task dispatching (your
> > task queue processing, the kernels thread context switching, any IPC
> > required, etc.) is longer then the duration of a single task.
> 
> The task is not a single pixel but a single tile (that is usually a
> region of 64x64 pixels). GIMP processes pixel regions by iterating
> over the tiles. The multi-threaded pixel processor uses a configurable
> number of threads. Each thread obtains a lock on the pixel-region,
> takes a pointer to the next tile from the queue, releases the lock,
> processes the tile and starts over.

I maintain a threaded image processing library called VIPS. 

  http://www.vips.ecs.soton.ac.uk/

We looked at granularity a while ago and the 'sweet spot' at which the
thread start/stop time became insignificant seemed to be around 50x50
pixels. I've pasted some numbers to the end of the mail in case anyone
is interested. I realise gimp is using a very different evaluation
strategy, but the point (maybe) is that thread manipulation is rather
quick and you're probably not seeing it with 64x64 pixel tiles.

FWIW, vips works by having a thread pool (rather than a tile queue)
and a simple for(;;) loop over tiles. At each tile, the for() loop
waits for a thread to become free, then assigns it a tile to work on.

The benchmark is a 45 degree rotate of a 4000 by 4000 pixel image.
Look for the point at which real time stops falling. The first two
arguments to "try" are the tilesize. The more recent numbers are
really too small to be accurate :-( but the benchmark is 10 years old
and took minutes back then: oh well. I'm supposed to be getting a quad
opteron soon, which will be interesting. Kernel 2.6 would help too no
doubt.

cima: 1 cpu ultrasparc
        ./try huysum.hr.v fred.v 10 10 1 20
        real       30.1
        user       24.5
        ./try huysum.hr.v fred.v 20 20 1 20
        real       19.2
        user       16.9
        ./try huysum.hr.v fred.v 30 30 1 20
        real       17.8
        user       15.4
        ./try huysum.hr.v fred.v 40 40 1 20
        real       17.1
        user       15.1
        ./try huysum.hr.v fred.v 50 50 1 20
        real       16.9
        user       15.1
        ./try huysum.hr.v fred.v 60 60 1 20
        real       16.6
        user       15.0
        ./try huysum.hr.v fred.v 70 70 1 20
        real       17.2
        user       15.2
        ./try huysum.hr.v fred.v 80 80 1 20
        real       17.3
        user       15.1
        ./try huysum.hr.v fred.v 90 90 1 20
        real       17.4
        user       15.3

perugino: 2 cpu supersparc
        ./try huysum.hr.v fred.v 10 10 1 20
        real    0m51.123s
        user    1m7.623s
        ./try huysum.hr.v fred.v 20 20 1 20
        real    0m24.601s
        user    0m41.133s
        ./try huysum.hr.v fred.v 30 30 1 20
        real    0m21.931s
        user    0m38.393s
        ./try huysum.hr.v fred.v 40 40 1 20
        real    0m20.208s 
        user    0m35.653s
        ./try huysum.hr.v fred.v 50 50 1 20
        real    0m20.109s
        user    0m35.283s 
        ./try huysum.hr.v fred.v 60 60 1 20
        real    0m19.501s
        user    0m34.513s
        ./try huysum.hr.v fred.v 70 70 1 20
        real    0m20.435s
        user    0m34.813s
        ./try huysum.hr.v fred.v 80 80 1 20
        real    0m20.558s
        user    0m35.293s
        ./try huysum.hr.v fred.v 90 90 1 20
        real    0m20.785s
        user    0m35.313s

Run on furini, 2 CPU 450MHz PII Xeon, kernel 2.4.4, vips-7.7.19, gcc 2.95.3
        ./try huysum.hr.v fred.v 10 10 1 20
        real    0m4.542s
        user    0m4.350s
        sys     0m3.800s
        ./try huysum.hr.v fred.v 20 20 1 20
        real    0m2.206s
        user    0m2.750s
        sys     0m1.250s
        ./try huysum.hr.v fred.v 30 30 1 20
        real    0m1.678s
        user    0m2.610s
        sys     0m0.580s
        ./try huysum.hr.v fred.v 40 40 1 20
        real    0m1.483s
        user    0m2.460s
        sys     0m0.410s
        ./try huysum.hr.v fred.v 50 50 1 20
        real    0m1.443s
        user    0m2.330s
        sys     0m0.350s
        ./try huysum.hr.v fred.v 60 60 1 20
        real    0m1.385s
        user    0m2.390s
        sys     0m0.220s
        ./try huysum.hr.v fred.v 70 70 1 20
        real    0m1.394s
        user    0m2.460s
        sys     0m0.150s
        ./try huysum.hr.v fred.v 80 80 1 20
        real    0m1.365s
        user    0m2.360s
        sys     0m0.200s
        ./try huysum.hr.v fred.v 90 90 1 20
        real    0m1.393s
        user    0m2.450s
        sys     0m0.180s

Run on manet, 2 CPU 2.5GHz P4 Xeon, kernel 2.4.18, vips-7.8.5, gcc 2.95.3
        ./try huysum.hr.v fred.v 10 10 1 20
        real    0m1.582s
        user    0m1.640s
        sys     0m1.470s
        ./try huysum.hr.v fred.v 20 20 1 20
        real    0m0.691s
        user    0m0.970s
        sys     0m0.410s
        ./try huysum.hr.v fred.v 30 30 1 20
        real    0m0.548s
        user    0m0.790s
        sys     0m0.230s
        ./try huysum.hr.v fred.v 40 40 1 20
        real    0m0.489s
        user    0m0.790s
        sys     0m0.160s
        ./try huysum.hr.v fred.v 50 50 1 20
        real    0m0.465s
        user    0m0.610s
        sys     0m0.180s
        ./try huysum.hr.v fred.v 60 60 1 20
        real    0m0.454s
        user    0m0.740s
        sys     0m0.030s
        ./try huysum.hr.v fred.v 70 70 1 20
        real    0m0.505s
        user    0m0.820s
        sys     0m0.120s
        ./try huysum.hr.v fred.v 80 80 1 20
        real    0m0.479s
        user    0m0.840s
        sys     0m0.090s
        ./try huysum.hr.v fred.v 90 90 1 20
        real    0m0.436s
        user    0m0.650s
        sys     0m0.040s

Run on constable, 2 CPU 2.5GHz P4 Xeon, kernel 2.4.21, vips-7.10.8, gcc 3.3.1
        ./try huysum.hr.v fred.v 10 10 1 20
        real    0m1.544s
        user    0m1.420s
        sys     0m1.422s
        ./try huysum.hr.v fred.v 20 20 1 20
        real    0m0.690s
        user    0m0.834s
        sys     0m0.441s
        ./try huysum.hr.v fred.v 30 30 1 20
        real    0m0.494s
        user    0m0.658s
        sys     0m0.244s
        ./try huysum.hr.v fred.v 40 40 1 20
        real    0m0.450s
        user    0m0.657s
        sys     0m0.174s
        ./try huysum.hr.v fred.v 50 50 1 20
        real    0m0.397s
        user    0m0.579s
        sys     0m0.144s
        ./try huysum.hr.v fred.v 60 60 1 20
        real    0m0.507s
        user    0m0.813s
        sys     0m0.123s
        ./try huysum.hr.v fred.v 70 70 1 20
        real    0m0.381s
        user    0m0.573s
        sys     0m0.115s
        ./try huysum.hr.v fred.v 80 80 1 20
        real    0m0.357s
        user    0m0.530s
        sys     0m0.101s
        ./try huysum.hr.v fred.v 90 90 1 20
        real    0m0.528s
        user    0m0.877s
        sys     0m0.103s
_______________________________________________
Gimp-developer mailing list
Gimp-developer@lists.xcf.berkeley.edu
http://lists.xcf.berkeley.edu/mailman/listinfo/gimp-developer

Reply via email to