Hi List

I bought a used MacBook Pro (13-inch, 2017, Two Thunderbolt 3 ports) and
wanted to compare the speed to my old laptop. And of course my new eGPU,
which is intended to be used for videoediting but why not also use it for
rawprocessing. Specs:
CPU: 2.3 GHz Intel Core i5
RAM: 16GB
GPU: Intel Iris Plus Graphics 640 1536 MB
eGPU: Blackmagic eGPU (AMD Radeon Pro 580 8GB)

Since I haven't found any information how to get the Blackmagic eGPU going
with darktable, I figured my experience dumped onto the mailinglist might
help other users of macOS to get it running (also with potentially other
eGPUs).

I found a benchmark on the mailinglist that applies a boatload (pun
intended) of filters to a RAW file called bench.SRW (which shows a boat, in
case someone wonders about the pun).

First, after attaching the eGPU (btw that's true for any opencl use of
darktable on macOS) run 'darktable -d opencl' multiple times watching it's
stdoutput until there are no errors and all opencl kernels for the new GPU
are successfully compiled.

Figuring out how to get the eGPU actually used was not easy, so I ran
benchmarks with different config settings to compare to my old laptop and
figure out when it was used and when not. The gist of it, it makes sense to
use the eGPU:

runtimes:
CPU only: 36.5sec
eGPU: 9.5sec
Intel Iris okay: 26.2sec
Intel Iris broken: 59.0sec

To use the eGPU you have to set a custom opencl_device_priority in the
darktablerc file and use opencl_scheduling_profile=default (I might be
misreading the docs, but only here the opencl_device_priority is actually
considered, also I used a silly priority '1/!0,1/1,0/1' which was intended
to make sure GPU 0(=Intel Iris) is ignored that is probably not what you
want for real life usage as this might have sideffects when the laptop is
detached from the eGPU).

Funny observation, using the builtin Intel Iris opencl is sometimes much
slower than even CPU only. I have no idea yet how to reproduce that, or
what is causing that, but it seems to only happen on the first runs (after
a system reboot for example). After a few runs it's back to its, faster
than cpu only, ~26sec speed, with the same config it just ran the 59sec
before...!?
Sometimes darktable-cli just dies, also not reproducible, after the death
however Intel Iris is fast again though - maybe that's what is needed to
speed Intel GPU up? (you can see all of that in the second and third
run_bench.sh output below)

find the script run_bench.sh attached to the email to give you an idea what
was done.

I'll happily take feedback what could have been done better and shouldn't
be used this way ;)

br
 mike

the output of my runs:

Running Benchmark with
this is darktable-cli 2.4.4
copyright (c) 2012-2018 johannes hanika, tobias ellinghaus
Darwin videostar.local 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22
20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
---
USING {'opencl_device_priority': '1/!0,1/1,0/1',
'opencl_scheduling_profile': 'default'}
11.185796 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): NOT utilized
11.193656 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): 551 out of 551 events were successful and 0 events lost
10.411295 [dev_process_export] pixel pipeline processing took 9.217 secs
(5.738 CPU)
---
USING {'opencl_device_priority': '1/!0,1/1,0/1',
'opencl_scheduling_profile': 'multiple GPUs'}
28.196287 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
28.201236 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
27.397242 [dev_process_export] pixel pipeline processing took 26.205 secs
(5.826 CPU)
---
USING {'opencl_device_priority': '1/!0,1/1,0/1',
'opencl_scheduling_profile': 'very fast GPU'}
28.180851 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
28.185697 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
27.387412 [dev_process_export] pixel pipeline processing took 25.968 secs
(5.816 CPU)
---
USING: no opencl
37.383207 [dev_process_export] pixel pipeline processing took 36.252 secs
(123.297 CPU)

---
Running Benchmark with
this is darktable-cli 2.4.4
copyright (c) 2012-2018 johannes hanika, tobias ellinghaus
Darwin videostar.local 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22
20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'default'}
60.424940 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
60.429111 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
59.634131 [dev_process_export] pixel pipeline processing took 58.446 secs
(5.850 CPU)
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'multiple GPUs'}
60.874240 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
60.879945 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
60.089620 [dev_process_export] pixel pipeline processing took 58.890 secs
(5.847 CPU)
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'very fast GPU'}
60.940268 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
60.944265 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
60.150698 [dev_process_export] pixel pipeline processing took 58.968 secs
(5.857 CPU)
---
USING: no opencl
37.733091 [dev_process_export] pixel pipeline processing took 36.605 secs
(124.547 CPU)

---
Running Benchmark with
this is darktable-cli 2.4.4
copyright (c) 2012-2018 johannes hanika, tobias ellinghaus
Darwin videostar.local 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22
20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'default'}
60.046220 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
60.049728 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
59.274210 [dev_process_export] pixel pipeline processing took 58.080 secs
(5.876 CPU)
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'multiple GPUs'}
./run_bench.sh: line 14:  1312 Abort trap: 6           $DARKTABLE bench.SRW
test.jpg --core -d perf -d opencl > bench.stdout
---
USING {'opencl_device_priority': '*/!0,*/*/*', 'opencl_scheduling_profile':
'very fast GPU'}
27.871595 [opencl_summary_statistics] device 'Intel(R) Iris(TM) Plus
Graphics 640' (0): 631 out of 631 events were successful and 0 events lost
27.876068 [opencl_summary_statistics] device 'AMD Radeon Pro 580 Compute
Engine' (1): NOT utilized
27.099700 [dev_process_export] pixel pipeline processing took 25.848 secs
(5.915 CPU)
---
USING: no opencl
37.295200 [dev_process_export] pixel pipeline processing took 36.172 secs
(122.860 CPU)

____________________________________________________________________________
darktable user mailing list
to unsubscribe send a mail to [email protected]
#!/bin/bash

# Based on post on darktable-user group in 2016

DARKTABLE=/Applications/darktable.app/Contents/MacOS/darktable-cli
export LC_CTYPE=C

echo "---"
echo "Running Benchmark with"
$DARKTABLE --version
uname -a

echo "---"
for value in {0..2}
do
	# with OpenCL
	rm -f test.jpg
	./change_config.py $value
	$DARKTABLE bench.SRW test.jpg --core -d perf -d opencl > bench.stdout
	grep opencl_summary_statistics bench.stdout
	grep dev_process_export bench.stdout
	echo "---"
done
# without OpenCL
echo USING: no opencl
rm -f test.jpg
$DARKTABLE bench.SRW test.jpg --core --disable-opencl -d perf > bench.stdout
grep dev_process_export bench.stdout

Reply via email to