Ok, I ran those tests on a machine with an AMD FX 8350 CPU and an NVIDIA
GTX TITAN GPU.
Results attached. It's up to 2x faster on the GPU than the CPU but the
speedup is less interesting in goliath.g, because of the depth complexity
of the scene I think.
We basically need to do a breadth-first rather than a depth-first ray tree
traversal like the ANSI C code in trunk/ does if we want to have improved
render performance in these high depth complexity scenes.
On Fri, Aug 11, 2017 at 11:33 PM, Marco Domingues <
marcodomingue...@gmail.com> wrote:
> Hello,
>
> Today I gathered some new times to compare the ANSI C boolean evaluation
> with the OpenCL implementation, and also to identify possible new
> optimizations to make on the code. Now using release builds and increasing
> the ray complexity (-s1024).
>
> I couldn’t figure out a way to track the time spent in each kernel yet,
> but I will keep looking for a way to do this. In the attached document,
> there are some comparisons between the current ANSI C code, and some
> variations of it (tracing the ray till the end, and without performing
> boolean operations), and also with the OpenCL implementations when running
> the code over the AMD/Intel OpenCL SDK on the CPU.
>
> I’ve also added side by side image comparisons in the document to show the
> current state of the OpenCL boolean implementation. There are still some
> shading differences, but the geometry seems correct (also you can notice
> missing primitives, this is the case for primitives that are not supported
> in OpenCL yet, i.e pipes in the goliath.g).
>
> In the document you can see that the current OpenCL implementation is
> slower than the ANSI C code, when running on the same hardware. But to be
> fair, the OpenCL version calculates intersections for the entire ray, and
> some major changes to the rendering loop had to be done to replicate the
> current behaviour of the ANSI C code, where ray intersections and boolean
> evaluations are done in parcial fashion.
>
> Finally, I’ve also committed the changes to the bool_eval() function that
> follows the behaviour of the current bool_eval() function in the trunk.
> Here you can see a comparison between the previous code (bool_eval() using
> the RPN tree) and with the new tree representation: https://
> brlcad.org/wiki/User:Marco-domingues/GSoC17/Log#10_August
>
> Cheers!
> Marco
>
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel
>
>
--
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
=== hardware ===
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 16
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 1
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 17
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 2
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 18
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 3
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 19
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 4
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 4
cpu cores : 4
apicid : 20
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 5
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 5
cpu cores : 4
apicid : 21
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 6
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 6
cpu cores : 4
apicid : 22
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
processor : 7
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600084f
cpu MHz : 4013.421
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 7
cpu cores : 4
apicid : 23
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs : fxsave_leak sysret_ss_attrs
bogomips : 8026.84
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
$ clinfo
Number of platforms: 2
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 CUDA 8.0.0
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
Platform Extensions:
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.2 AMD-APP (1445.5)
Platform Name: AMD Accelerated Parallel
Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd
cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa
Platform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 10deh
Max compute units: 14
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 64
Max work group size: 1024
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 1
Native vector width short: 1
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 875Mhz
Address bits: 64
Max memory allocation: 1594343424
Image support: Yes
Max number of images read arguments: 256
Max number of images write arguments: 16
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 4096
Max image 3D height: 4096
Max image 3D depth: 4096
Max samplers within kernel: 32
Max size of kernel argument: 4352
Alignment (bits) of base address: 4096
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 229376
Global memory size: 6377373696
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 49152
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x0000000000d09110
Name: GeForce GTX TITAN
Vendor: NVIDIA Corporation
Device OpenCL C version: OpenCL C 1.2
Driver version: 375.66
Profile: FULL_PROFILE
Version: OpenCL 1.2 CUDA
Extensions:
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
...
===
az35,el25
=== operators.g ===
mged> e operators
mged> rt -s1024
SHOT: cpu = 0.872 sec, elapsed = 0.12597 sec
parent: 0.8user 0.0sys 0:00real 725% 0i+0d 33188maxrss 0+77pf 73+82csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1182, #free=0, #realloc=10 (1182 retained)
1518991 solid/ray intersections: 425233 hits + 1093758 miss
pruned 28.0%: 410025 model RPP, 793343 dups skipped, 223310 solid RPP
Frame 0: 1048576 pixels in 0.11 sec = 9619963.30 pixels/sec
Frame 0: 900096 rays in 0.11 sec = 8257761.47 rays/sec (RTFM)
Frame 0: 900096 rays in 0.87 sec = 1032220.18 rays/CPU_sec
Frame 0: 900096 rays in 0.13 sec = 7145320.31 rays/sec (wallclock)
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.048 sec, elapsed = 0.062195 sec
parent: 0.0user 0.0sys 0:00real 83% 0i+0d 134990maxrss 0+2962pf 7+0csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=11 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.01 sec = 174762666.67 pixels/sec
Frame 0: 0 rays in 0.01 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.05 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.06 sec = 0.00 rays/sec (wallclock)
=== boolean_ops.g ===
mged> e all
mged> rt -s1024
SHOT: cpu = 0.764 sec, elapsed = 0.1073 sec
parent: 0.7user 0.0sys 0:00real 760% 0i+0d 33152maxrss 0+107pf 101+207csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1654, #free=0, #realloc=10 (1654 retained)
947527 solid/ray intersections: 312946 hits + 634581 miss
pruned 33.0%: 317291 model RPP, 676926 dups skipped, 97941 solid RPP
Frame 0: 1048576 pixels in 0.10 sec = 10979853.40 pixels/sec
Frame 0: 916480 rays in 0.10 sec = 9596649.21 rays/sec (RTFM)
Frame 0: 916480 rays in 0.76 sec = 1199581.15 rays/CPU_sec
Frame 0: 916480 rays in 0.11 sec = 8541286.11 rays/sec (wallclock
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.032 sec, elapsed = 0.05063 sec
parent: 0.0user 0.0sys 0:00real 80% 0i+0d 136640maxrss 0+1050pf 0+0csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=11 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.00 sec = 262144000.00 pixels/sec
Frame 0: 0 rays in 0.00 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.03 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.05 sec = 0.00 rays/sec (wallclock)
=== truck.g ===
mged> e g4
mged> rt -s1024
SHOT: cpu = 0.524 sec, elapsed = 0.071049 sec
parent: 0.5user 0.0sys 0:00real 742% 0i+0d 33646maxrss 0+87pf 59+71csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1296, #free=0, #realloc=12 (1296 retained)
237205 solid/ray intersections: 119538 hits + 117667 miss
pruned 50.4%: 833005 model RPP, 138816 dups skipped, 44030 solid RPP
Frame 0: 1048576 pixels in 0.07 sec = 16008793.89 pixels/sec
Frame 0: 904192 rays in 0.07 sec = 13804458.02 rays/sec (RTFM)
Frame 0: 904192 rays in 0.52 sec = 1725557.25 rays/CPU_sec
Frame 0: 904192 rays in 0.07 sec = 12726315.64 rays/sec (wallclock)
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.048 sec, elapsed = 0.068098 sec
parent: 0.0user 0.0sys 0:00real 100% 0i+0d 147490maxrss 0+2574pf 8+0csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.01 sec = 174762666.67 pixels/sec
Frame 0: 0 rays in 0.01 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.05 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.07 sec = 0.00 rays/sec (wallclock)
=== tank_car.g ===
mged> e tank_car
mged> rt -s1024
SHOT: cpu = 1.152 sec, elapsed = 0.17831 sec
parent: 1.1user 0.0sys 0:00real 676% 0i+0d 91430maxrss 0+190pf 38+157csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1748, #free=0, #realloc=120 (1748 retained)
1038921 solid/ray intersections: 479173 hits + 559748 miss
pruned 46.1%: 904065 model RPP, 1481105 dups skipped, 386886 solid RPP
Frame 0: 1048576 pixels in 0.14 sec = 7281777.78 pixels/sec
Frame 0: 1047719 rays in 0.14 sec = 7275826.39 rays/sec (RTFM)
Frame 0: 1047719 rays in 1.15 sec = 909478.30 rays/CPU_sec
Frame 0: 1047719 rays in 0.18 sec = 5875828.61 rays/sec (wallclock)
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.084 sec, elapsed = 0.136295 sec
parent: 0.0user 0.0sys 0:00real 100% 0i+0d 143506maxrss 0+3079pf 3+0csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.01 sec = 99864380.95 pixels/sec
Frame 0: 0 rays in 0.01 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.08 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.14 sec = 0.00 rays/sec (wallclock)
=== goliath.g ==
mged> e Goliath.c
mged> rt -s1024
SHOT: cpu = 2.548 sec, elapsed = 0.389415 sec
parent: 2.5user 0.0sys 0:00real 668% 0i+0d 89572maxrss 0+244pf 98+414csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=2877, #free=0, #realloc=114 (2877 retained)
3524016 solid/ray intersections: 1567527 hits + 1956489 miss
pruned 44.5%: 727441 model RPP, 3035766 dups skipped, 1651079 solid RPP
Frame 0: 1048576 pixels in 0.32 sec = 3292232.34 pixels/sec
Frame 0: 1246392 rays in 0.32 sec = 3913318.68 rays/sec (RTFM)
Frame 0: 1246392 rays in 2.55 sec = 489164.84 rays/CPU_sec
Frame 0: 1246392 rays in 0.39 sec = 3200677.94 rays/sec (wallclock)
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.232 sec, elapsed = 0.355876 sec
parent: 0.2user 0.1sys 0:00real 97% 0i+0d 324494maxrss 0+2806pf 6+2csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.03 sec = 36157793.10 pixels/sec
Frame 0: 0 rays in 0.03 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.23 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.36 sec = 0.00 rays/sec (wallclock)
=== havoc.g ===
mged> e havoc
mged> rt -s1024
SHOT: cpu = 5.188 sec, elapsed = 0.657286 sec
parent: 5.1user 0.0sys 0:00real 784% 0i+0d 35446maxrss 0+232pf 58+52csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=2243, #free=0, #realloc=67 (2243 retained)
1719392 solid/ray intersections: 500752 hits + 1218640 miss
pruned 29.1%: 495126 model RPP, 6854415 dups skipped, 1175750 solid RPP
Frame 0: 1048576 pixels in 0.65 sec = 1616925.21 pixels/sec
Frame 0: 935500 rays in 0.65 sec = 1442559.75 rays/sec (RTFM)
Frame 0: 935500 rays in 5.19 sec = 180319.97 rays/CPU_sec
Frame 0: 935500 rays in 0.66 sec = 1423276.93 rays/sec (wallclock)
mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.404 sec, elapsed = 0.538892 sec
parent: 0.4user 0.1sys 0:00real 101% 0i+0d 183538maxrss 0+3788pf 7+1csw
children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%: 0 model RPP, 0 dups skipped, 0 solid RPP
Frame 0: 1048576 pixels in 0.05 sec = 20763881.19 pixels/sec
Frame 0: 0 rays in 0.05 sec = 0.00 rays/sec (RTFM)
Frame 0: 0 rays in 0.40 sec = 0.00 rays/CPU_sec
Frame 0: 0 rays in 0.54 sec = 0.00 rays/sec (wallclock)
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel