Re: [darktable-user] open CL 2.0.6

Ulrich Pegelow Fri, 16 Sep 2016 23:02:48 -0700

Thanks. As in another case discussed before your OpenCL system does notseem to be limited by memory tranfer. host<->device transfers account tobelow 50% of the time spent in the pixelpipe and I regard this as ahealthy value. Changes in the memory transfer method don't help here andyou probably already get the maximum you can expect from that device.

(On a sidenote: for OpenCL benchmarking please run 'darktable -d opencl-d perf' rather than 'darktable -d all' because the latter produces toomuch junk output).


Ulrich



Am 17.09.2016 um 03:21 schrieb I. Ivanov:

I am attaching my logs

this is darktable 2.0.6
copyright (c) 2009-2016 johannes hanika
[email protected]

compile options:
  bit depth is 64 bit
  normal build
  OpenMP support enabled
  OpenCL support enabled
  Lua support enabled, API version 3.0.0
  Colord support enabled
  gPhoto2 support enabled
  GraphicsMagick support enabled

CPU~Quad core Intel Core i7-2630QM (-HT-MCP-) speed/max~800/2900 MHz
Kernel~4.4.0-36-generic x86_64 Up~3 days Mem~3038.2/7877.1MB
HDD~500.1GB(73.3% used) Procs~289 Client~Shell inxi~2.2.35

Graphics:  Card-1: Intel 2nd Generation Core Processor Family Integrated
Graphics Controller
           Card-2: NVIDIA GF108M [GeForce GT 525M]
           Display Server: X.Org 1.18.3 driver: nvidia Resolution:
[email protected], [email protected]
           GLX Renderer: GeForce GT 525M/PCIe/SSE2 GLX Version: 4.5.0
NVIDIA 361.42

https://drive.google.com/open?id=0B-ibE69DzumKSXh2VmtyQUJRV2c

clocked results for exporting 50 images.

open cl on pinned true - 10 min 30s
open cl on pinned false - 9 min 13s
open cl off - 8 min 15s

For me - it appears that *open cl off* is the fastest. I have no
explanation why I perceived that pinned=true is faster. It certainly
"looked" faster to me when in dartkable mode. But the numbers are above.
I "think" what happen is that I increased the complexity how many
modules I activate and this somehow convinced me that DT slowed down. In
fact - what did happen is - my images became more complex and it simply
takes more time for DT to deal with them.

Hope this info is useful.

Thank you,

B





On 2016-09-16 01:08 PM, Michael Below wrote:

Hi,

another example from me. As far as I can see, pinning has a slightly
worse performance than the default.

My system:
CPU~Quad core AMD Phenom II X4 810 (-MCP-) speed~2600 MHz (max)
Kernel~4.6.0-1-amd64 x86_64 Up~3:45 Mem~2389.4/5956.0MB
HDD~3250.7GB(17.9% used) Procs~300 Client~Shell inxi~2.3.1

I think it would improve my use case most if the "atrous" module would
run on GPU. There seems to be some issue with tile size that makes the
equalizer module take e.g. 13 seconds on some images.

Cheers
Michael


Am Fr 16 Sep 2016 07:37:45 CEST
schrieb Ulrich Pegelow <[email protected]>:

Thanks for sharing. Yours is a good example of an OpenCL system that
is not limited by host<->device memory transfers. In a typical export
job your system spends about 30% of its time in memory transfer, the
rest is pure computing. That's a very good situation in which pinned
memory does not give advantages - maybe even slow down a bit.

Others have systems which are purely limited by memory transfer. We
have reports of insane cases where over 95% of the OpenCL pixelpipe
is used by memory transfers. Those are the ones where
opencl_use_pinned_memory makes a real difference.

Ulrich

Am 15.09.2016 um 22:11 schrieb KOVÁCS István:

Hi,

Core2-Duo E6550 @ 2.33GHz +Nvidia GeForce GTX 650 / 2 GB, driver
361.42, OpenCL 1.2 CUDA, darktable 2.0.6 from PPA.
With pinned memory, performance is slightly (about 10%?) worse.
There are lines like
[opencl_profiling] spent  0,3774 seconds in [Map Buffer]
that are only seen in the 'pinned' log.
One notable difference after exporting 114 photos:
pinned = false gives
[opencl_summary_statistics] device 'GeForce GTX 650': 8960 out of
8960 events were successful and 0 events lost

pinned = true gives
[opencl_summary_statistics] device 'GeForce GTX 650': 9933 out of
9933 events were successful and 0 events lost

as one of the last lines in the output.
My opencl-related darktablerc entries:
opencl=TRUE
opencl_async_pixelpipe=false
opencl_avoid_atomics=false
opencl_checksum=2684983341
opencl_device_priority=*/!0,*/*/*
opencl_library=
opencl_memory_headroom=300
opencl_memory_requirement=768
opencl_micro_nap=1000
opencl_number_event_handles=25
opencl_omit_whitebalance=
opencl_size_roundup=16
opencl_synch_cache=false
opencl_use_cpu_devices=false
opencl_use_pinned_memory=false

The logs are at:
http://tech.kovacs-telekes.org/files/darktable-opencl-pinned-memory/

Thanks,
Kofa

____________________________________________________________________________
darktable user mailing list
to unsubscribe send a mail to
[email protected]



____________________________________________________________________________
darktable user mailing list to unsubscribe send a mail to
[email protected]


____________________________________________________________________________
darktable user mailing list
to unsubscribe send a mail to [email protected]

Re: [darktable-user] open CL 2.0.6

Reply via email to