Hi all,
I spent couple of hours trying to find how image processing in various dt
pixelpipes is distributed between the available opencl devices (one in my
case) and the CPU non-opencl device in darkroom mode. The reason for this
investigation was that I saw in the -d opencl -d perf output that the
non-opencl CPU device (i.e. opencl device -1) is used too much often for
processing of the full pixelpipe during opening the image from filmstrip
and editing it as well. I decided to do the investigation and to contribute
to improving of this thing.
Firstly I tried to open each image after another from the filmstrip in the
darkroom and I carefully checked the console output. In this image opening
scenario I've found that it works every time in the following sequence:
1) thumbnail pixelpipe of the image being closed (only if it contains any
editing)
2) full pixelpipe of the image being opened (this pixelpipe starts earlier
than the previous finishes)
3) preview pixelpipe of the image being opened (this pixelpipe starts after
the previous finishes)
The important thing here is that steps 1 and 2 runs in parallel but 2 and 3
in sequence. When I excluded the GPU opencl device from processing of the
thumbnail pixelpipe in the opencl_device_priority config parameter, it
finally runs as I would expect, i.e. 1) CPU, 2)&3) GPU. So far so good. But
there is one major drawback. The same opencl_device_priority setting do
apply for the thumbnail processing in the lighttable as well. Therefore any
opening of new image directory is processed on CPU only and is painfully
slow in lighttable now.
The other scenario I've focused on was editing of the image in darkroom.
Here I've found that when I do any edit to the image, the full and preview
pixelpipes run surprisingly in a opposite direction than in the previous
scenario, i.e.:
1) preview pixelpipe
2) full pixelpipe (this pixelpipe starts earlier than the previous finishes)
The other important thing is that they do not run in a sequence any more
but in parallel. It means they cannot be run both on the same opencl device
which is a big problem for the system responsiveness during the image
editing. In this situation the good idea seems to be to exclude the GPU
opencl device from processing of the preview pixelpipe and thus allow it to
process the full pixelpipe only. Well, it is easily achievable using the
opencl_device_priority config parameter again. My reads now as follows
"*//*/", i.e. GPU opencl device is excluded from procession of preview and
thumbnail pixelpipes.
But even after this tweak the situation is still far from good. According
to my findings processing of the full pixelpipe using the GPU opencl device
takes no more than 0.4s. But processing of the much smaller version of the
same image in the preview pixelpipe using the CPU non-opencl device takes
usually more than 1s, quite often some 1.5s. It clearly means that the
system responsiveness during image editing is driven by processing of the
not much useful small preview image. I thing it is a big pity. It would be
much faster and system more responsive if the preview and full pixelpipes
would be processed in sequence on the same opencl device if available in
system. According to my estimate, the total time needed for processing the
image after single edit would drop from some 1 - 1.5s to some 0.3 - 0.7s.
This would be almost instant. The other possible solution would be enabling
of switching the preview image off and to avoid of preview pixelpipe
processing at all. I do not know how much others use the preview image, but
I've found myself to not need it at all.
Finally I'm heading towards the end and proposed feature requests that
should improve the current OpenCL implementation in dt:
1) Allow user to set the opencl priority for thumbnails in lighttable and
in darkroom independently
2) Allow user to switch the preview image in darkroom off and to stop
processing of the preview pixelpipe
3) Change processing of the preview and full pixepipes during image edit
from parallel to sequential if opencl is ON and only one opencl device is
available for dt (in case of two available opencl devices, it might be good
idea to process these pixelpipes in parallel as they are now)
I did my testing using the dt 1.4.rc1 on Ubuntu 13.04 64bit and AMD binary
GPU driver.
Thank you in advance for thinking over this proposals and sorry for a
rather long post...
Suni
------------------------------------------------------------------------------
Sponsored by Intel(R) XDK
Develop, test and display web and hybrid apps with a single code base.
Download it for free now!
http://pubads.g.doubleclick.net/gampad/clk?id=111408631&iu=/4140/ostg.clktrk
_______________________________________________
darktable-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/darktable-devel