Thanks. As in another case discussed before your OpenCL system does not
seem to be limited by memory tranfer. host<->device transfers account to
below 50% of the time spent in the pixelpipe and I regard this as a
healthy value. Changes in the memory transfer method don't help here and
you probably already get the maximum you can expect from that device.
(On a sidenote: for OpenCL benchmarking please run 'darktable -d opencl
-d perf' rather than 'darktable -d all' because the latter produces too
much junk output).
Am 17.09.2016 um 03:21 schrieb I. Ivanov:
I am attaching my logs
this is darktable 2.0.6
copyright (c) 2009-2016 johannes hanika
bit depth is 64 bit
OpenMP support enabled
OpenCL support enabled
Lua support enabled, API version 3.0.0
Colord support enabled
gPhoto2 support enabled
GraphicsMagick support enabled
CPU~Quad core Intel Core i7-2630QM (-HT-MCP-) speed/max~800/2900 MHz
Kernel~4.4.0-36-generic x86_64 Up~3 days Mem~3038.2/7877.1MB
HDD~500.1GB(73.3% used) Procs~289 Client~Shell inxi~2.2.35
Graphics: Card-1: Intel 2nd Generation Core Processor Family Integrated
Card-2: NVIDIA GF108M [GeForce GT 525M]
Display Server: X.Org 1.18.3 driver: nvidia Resolution:
GLX Renderer: GeForce GT 525M/PCIe/SSE2 GLX Version: 4.5.0
clocked results for exporting 50 images.
open cl on pinned true - 10 min 30s
open cl on pinned false - 9 min 13s
open cl off - 8 min 15s
For me - it appears that *open cl off* is the fastest. I have no
explanation why I perceived that pinned=true is faster. It certainly
"looked" faster to me when in dartkable mode. But the numbers are above.
I "think" what happen is that I increased the complexity how many
modules I activate and this somehow convinced me that DT slowed down. In
fact - what did happen is - my images became more complex and it simply
takes more time for DT to deal with them.
Hope this info is useful.
On 2016-09-16 01:08 PM, Michael Below wrote:
another example from me. As far as I can see, pinning has a slightly
worse performance than the default.
CPU~Quad core AMD Phenom II X4 810 (-MCP-) speed~2600 MHz (max)
Kernel~4.6.0-1-amd64 x86_64 Up~3:45 Mem~2389.4/5956.0MB
HDD~3250.7GB(17.9% used) Procs~300 Client~Shell inxi~2.3.1
I think it would improve my use case most if the "atrous" module would
run on GPU. There seems to be some issue with tile size that makes the
equalizer module take e.g. 13 seconds on some images.
Am Fr 16 Sep 2016 07:37:45 CEST
schrieb Ulrich Pegelow <ulrich.pege...@tongareva.de>:
Thanks for sharing. Yours is a good example of an OpenCL system that
is not limited by host<->device memory transfers. In a typical export
job your system spends about 30% of its time in memory transfer, the
rest is pure computing. That's a very good situation in which pinned
memory does not give advantages - maybe even slow down a bit.
Others have systems which are purely limited by memory transfer. We
have reports of insane cases where over 95% of the OpenCL pixelpipe
is used by memory transfers. Those are the ones where
opencl_use_pinned_memory makes a real difference.
Am 15.09.2016 um 22:11 schrieb KOVÁCS István:
Core2-Duo E6550 @ 2.33GHz +Nvidia GeForce GTX 650 / 2 GB, driver
361.42, OpenCL 1.2 CUDA, darktable 2.0.6 from PPA.
With pinned memory, performance is slightly (about 10%?) worse.
There are lines like
[opencl_profiling] spent 0,3774 seconds in [Map Buffer]
that are only seen in the 'pinned' log.
One notable difference after exporting 114 photos:
pinned = false gives
[opencl_summary_statistics] device 'GeForce GTX 650': 8960 out of
8960 events were successful and 0 events lost
pinned = true gives
[opencl_summary_statistics] device 'GeForce GTX 650': 9933 out of
9933 events were successful and 0 events lost
as one of the last lines in the output.
My opencl-related darktablerc entries:
The logs are at:
darktable user mailing list
to unsubscribe send a mail to
darktable user mailing list to unsubscribe send a mail to
darktable user mailing list
to unsubscribe send a mail to darktable-user+unsubscr...@lists.darktable.org