Hi all,

I spent couple of hours trying to find how image processing in various dt
pixelpipes is distributed between the available opencl devices (one GPU
device in my case) and the CPU non-opencl device in darkroom mode. The
reason for this investigation was that I saw in the -d opencl -d perf
output that the non-opencl CPU device (i.e. opencl device -1) is used too
much often for processing of the full pixelpipe during opening the image
from filmstrip and editing it as well. I decided to do the investigation
and to contribute to improving of this thing.

Firstly I tried to open each image after another from the filmstrip in the
darkroom and I carefully checked the console output. In this image opening
scenario I've found that it works every time in the following sequence:
1) thumbnail pixelpipe of the current image being closed (only if it
contains any editing)
2) full pixelpipe of the next image being opened (this pixelpipe starts
earlier than the previous finishes)
3) preview pixelpipe of the next image being opened (this pixelpipe starts
after the previous finishes)

The important thing here is that steps 1 and 2 runs in parallel but 2 and 3
in sequence. When I excluded the GPU opencl device from processing of the
thumbnail pixelpipe using the opencl_device_priority config parameter, it
finally runs as I would expect, i.e. 1) CPU, 2)&3) GPU. So far, so good.
But there is one major drawback. The same opencl_device_priority setting do
apply for the thumbnail processing in the lighttable as well. Therefore any
opening of new image directory is processed on CPU only and is painfully
slow in lighttable now.

The other scenario I've focused on was editing of the image in darkroom.
Here I've found that when I do any edit to the image, the full and preview
pixelpipes run surprisingly in a opposite direction than in the previous
scenario, i.e.:
1) preview pixelpipe
2) full pixelpipe (this pixelpipe starts earlier than the previous finishes)

The other important thing is that they do not run in a sequence any more
but in parallel. It means they cannot be run both on the same opencl device
which is a big problem for the system responsiveness during the image
editing. At least if only one fast opencl device is available. In this
situation the good idea seems to be to exclude the GPU opencl device from
processing of the preview pixelpipe and thus allow it to process the full
pixelpipe only. Well, it is easily achievable using the
opencl_device_priority config parameter again. My reads now as follows
"*//*/", i.e. GPU opencl device is excluded from procession of preview and
thumbnail pixelpipes.

But even after this tweak the situation is still far from good. According
to my findings processing of the full pixelpipe using the GPU opencl device
takes no more than 0.4s. But processing of the much smaller version of the
same image in the preview pixelpipe using the CPU non-opencl device takes
usually more than 1s, quite often up to 1.5s. It clearly means that the
system responsiveness during image editing is driven by processing of the
not much useful small preview image. I think it is a big pity. It would be
much faster and system more responsive if the preview and full pixelpipes
would be processed in sequence on the same opencl device. According to my
estimate, the total time needed for processing the image after single edit
would drop from some 1 - 1.5s to some 0.4 - 0.7s. This would be almost
instant. The other possible solution would be to enable of switching the
preview image off and thus to avoid preview pixelpipe processing at all. I
do not know how much other users use the preview image, but I've found
myself to not need it at all.

Finally I'm heading towards the end of my email and I propose the features
below that should improve the current OpenCL implementation in dt:
1) Allow user to set the opencl priority for thumbnails in lighttable and
in darkroom independently
2) Allow user to switch the preview image in darkroom off and to stop
processing of the preview pixelpipe
3) Change processing of the preview and full pixepipes during image edit
from parallel to sequential if opencl is ON and only one opencl device is
available for dt (in case of two available opencl devices, it might be good
idea to process these pixelpipes in parallel as they are now)

I did my testing using the dt 1.4.rc1 on Ubuntu 13.04 64bit and AMD binary
GPU driver.

Thank you in advance for thinking over this findings and proposals and
sorry for a rather long post...

Suni
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
darktable-devel mailing list
darktable-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/darktable-devel

Reply via email to