[hugin-ptx] Re: clustered hugin

kevin Sun, 16 Jan 2011 06:10:49 -0800

I've got three machines connected on a LAN and have been working on
spreading the stitching programs across them to see how much of a
speed up I can get.  All the tests I've done are with a final
stitch that's 500Mpixel in size and uses 174 images which are stacks
of 3 bracketed images.  I picked this size stitch because it doesn't
swap at all. I know at 750Mpixel enblend gets big enough that it
needs some drive space to work, and once you start hitting the drive
like that it's going to slow down a lot.


The specs for the three machines are:

A - Core i7 2.6GHz 4 core, 24G memory, Nvidia GTX460, Slackware 64-bit
B - Core 2  2.4GHZ 4 core,  8G memory, Nvidia 9600GT, Slackware 64-bit
C - Core 2  2.0GHz 2 core,  4G memory, Nvidia 8400GS, Slackware 64-bit

Here's what I've learnt so far - now this is relevant for how I
stitch together, for someone else I'm sure they'd get different
numbers.  I run hugin to make both the "Exposure fused from stacks"
and "Exposure fused from any arrangement".

There are five stitching parts:

1 - nona for layer images
2 - enfuse for ldr images
3 - enblend for exposure images (needs layer images)
4 - enfuse for blended_fused image (needs exposure images)
5 - enblend for fused image (needs ldr images)


nona - the single biggest speedup for nona is being able to use the
GPU.  I've also found that while nona is threaded, I get the fastest
times by running the same number of nona as the number of cores on
the machine - all set to use the GPU.  So for a 4-core machine, 4
nonas at a time all with the -g option.  A 2-core machine, 2 nonas at
a time both using the GPU.

enfuse - this one is similar to nona, it's threaded but I get the
fastest times by running the same number of enfuse processes as the
number of cores on the machine.

enblend - the most speedup I've gotten from enblend is using the '-a'
option.  I've tried using the --gpu but it always crashes at some
point.  I've tried the --gpu with and without '-a', but still
crashes.

For stages 3, 4, & 5, because the final images they produce are large
in size (500MPixel) I have to run them on machine A, otherwise they
start swapping.  Now I can turned off the -a to enblend and that
will let me run stage 3 on the 8GB memory machine.  However, I did
some tests and not a huge benefit, that will come later.  If I had
other machines with 24G of memory then I could spread them out.

Here's the numbers, hopefully this will come out looking ok.
1st column - labels for stage (1-5) and a total (t).
2nd column - time in min/sec using all 3 machines
3rd column - % of time compared to total in col 2
4th column - time in min/sec just running locally
5th column - % of time compared to total in col 4

Column 4 and 5 would be the times taken if you just pressed the
stitch button in hugin.

1    3m29.959s    4.86%         21m37.362s    21.85%
2    4m06.984s    5.71%         20m16.820s    20.50%
3   24m37.791s   34.18%         24m37.791s    24.89%
4    4m50.763s    6.73%          4m50.763s     4.90%
5    7m13.954s   10.04%          7m13.954s     7.31%
    44m19.451s                  78m36.690s


As you can see using two extra machines I was able to reduce the time
to around 56% of original.  Stages 1 and 2 are trivial to run
remotely in parallel and they speed up very nicely.  And the
processes while they are running don't blow up to huge sizes, so you
don't need a machine with a huge amount of main memory to run them.

Stages 3-5 are a different story though.  For my test stitch stage 3
enblend gets run 3 times to create three different exposure images.
Each exposure image is made up of a mutually exclusive set of ldr
images from stage 2.  Looking at this problem you'd think you could
just run them in parallel and be done with it, but you can't.  Since
the final image is big, the images produced in this stage is big
too.  When running one of these enblends on machine B the process got
up to 12G in size, so much more then the main memory of machine B, so
started swapping.  But I could call enblend without the -a option and
I could run it on machine B, because then enblend only got up to
about 5-6G in size.  If I ran all three serially on machine A using
-a it ok 24m37.791s.  If I ran two serially on machine A using -a and
ran one on machine B without -a, it took 24m5.534s.  Not a huge
amount of speed up, the -a makes enblend run faster, if you can
use it.  And if I was doing a 750MPixel stitch, I'm sure the run on
machine B would taken even longer.


What I've determined is that as I stitch larger and larger, it's
always going to take more time, it's not possible to keep the time
constant while increasing the final image size by throwing more
machines at the problem.  The reason being while with stages 1 & 2
more machines will always decrease the time, stages 3-5 require a
machine with an increase in main memory to correspond to an increase
in final stitch size.  For a real increase in speed with enblend it'd
need to be designed so that the problem it's solving can be broken
into pieces that don't require all the information of the entire
image.  That way when those pieces are worked on by remote machines
it wouldn't require all the remote machines to have a large amount of
main memory.  But I don't know if enblend could even been designed
that way.

-- 
You received this message because you are subscribed to the Google Groups 
"Hugin and other free panoramic software" group.
A list of frequently asked questions is available at: 
http://wiki.panotools.org/Hugin_FAQ
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/hugin-ptx

[hugin-ptx] Re: clustered hugin

Reply via email to