I am attaching my logs

this is darktable 2.0.6
copyright (c) 2009-2016 johannes hanika

compile options:
  bit depth is 64 bit
  normal build
  OpenMP support enabled
  OpenCL support enabled
  Lua support enabled, API version 3.0.0
  Colord support enabled
  gPhoto2 support enabled
  GraphicsMagick support enabled

CPU~Quad core Intel Core i7-2630QM (-HT-MCP-) speed/max~800/2900 MHz Kernel~4.4.0-36-generic x86_64 Up~3 days Mem~3038.2/7877.1MB HDD~500.1GB(73.3% used) Procs~289 Client~Shell inxi~2.2.35

Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated Graphics Controller
           Card-2: NVIDIA GF108M [GeForce GT 525M]
Display Server: X.Org 1.18.3 driver: nvidia Resolution: 1920x1080@60.00hz, 1366x768@60.06hz GLX Renderer: GeForce GT 525M/PCIe/SSE2 GLX Version: 4.5.0 NVIDIA 361.42


clocked results for exporting 50 images.

open cl on pinned true - 10 min 30s
open cl on pinned false - 9 min 13s
open cl off - 8 min 15s

For me - it appears that *open cl off* is the fastest. I have no explanation why I perceived that pinned=true is faster. It certainly "looked" faster to me when in dartkable mode. But the numbers are above. I "think" what happen is that I increased the complexity how many modules I activate and this somehow convinced me that DT slowed down. In fact - what did happen is - my images became more complex and it simply takes more time for DT to deal with them.

Hope this info is useful.

Thank you,


On 2016-09-16 01:08 PM, Michael Below wrote:

another example from me. As far as I can see, pinning has a slightly
worse performance than the default.

My system:
CPU~Quad core AMD Phenom II X4 810 (-MCP-) speed~2600 MHz (max)
Kernel~4.6.0-1-amd64 x86_64 Up~3:45 Mem~2389.4/5956.0MB
HDD~3250.7GB(17.9% used) Procs~300 Client~Shell inxi~2.3.1

I think it would improve my use case most if the "atrous" module would
run on GPU. There seems to be some issue with tile size that makes the
equalizer module take e.g. 13 seconds on some images.


Am Fr 16 Sep 2016 07:37:45 CEST
schrieb Ulrich Pegelow <ulrich.pege...@tongareva.de>:

Thanks for sharing. Yours is a good example of an OpenCL system that
is not limited by host<->device memory transfers. In a typical export
job your system spends about 30% of its time in memory transfer, the
rest is pure computing. That's a very good situation in which pinned
memory does not give advantages - maybe even slow down a bit.

Others have systems which are purely limited by memory transfer. We
have reports of insane cases where over 95% of the OpenCL pixelpipe
is used by memory transfers. Those are the ones where
opencl_use_pinned_memory makes a real difference.


Am 15.09.2016 um 22:11 schrieb KOVÁCS István:

Core2-Duo E6550 @ 2.33GHz +Nvidia GeForce GTX 650 / 2 GB, driver
361.42, OpenCL 1.2 CUDA, darktable 2.0.6 from PPA.
With pinned memory, performance is slightly (about 10%?) worse.
There are lines like
[opencl_profiling] spent  0,3774 seconds in [Map Buffer]
that are only seen in the 'pinned' log.
One notable difference after exporting 114 photos:
pinned = false gives
[opencl_summary_statistics] device 'GeForce GTX 650': 8960 out of
8960 events were successful and 0 events lost

pinned = true gives
[opencl_summary_statistics] device 'GeForce GTX 650': 9933 out of
9933 events were successful and 0 events lost

as one of the last lines in the output.
My opencl-related darktablerc entries:

The logs are at:

darktable user mailing list
to unsubscribe send a mail to

darktable user mailing list
to unsubscribe send a mail to darktable-user+unsubscr...@lists.darktable.org

Reply via email to