Re: [brlcad-devel] bool_eval()

Vasco Alexandre da Silva Costa Mon, 14 Aug 2017 07:46:47 -0700

Ok, I ran those tests on a machine with an AMD FX 8350 CPU and an NVIDIA
GTX TITAN GPU.


Results attached. It's up to 2x faster on the GPU than the CPU but the
speedup is less interesting in goliath.g, because of the depth complexity
of the scene I think.

We basically need to do a breadth-first rather than a depth-first ray tree
traversal like the ANSI C code in trunk/ does if we want to have improved
render performance in these high depth complexity scenes.

On Fri, Aug 11, 2017 at 11:33 PM, Marco Domingues <
marcodomingue...@gmail.com> wrote:

> Hello,
>
> Today I gathered some new times to compare the ANSI C boolean evaluation
> with the OpenCL implementation, and also to identify possible new
> optimizations to make on the code. Now using release builds and increasing
> the ray complexity (-s1024).
>
> I couldn’t figure out a way to track the time spent in each kernel yet,
> but I will keep looking for a way to do this. In the attached document,
> there are some comparisons between the current ANSI C code, and some
> variations of it (tracing the ray till the end, and without performing
> boolean operations), and also with the OpenCL implementations when running
> the code over the AMD/Intel OpenCL SDK on the CPU.
>
> I’ve also added side by side image comparisons in the document to show the
> current state of the OpenCL boolean implementation. There are still some
> shading differences, but the geometry seems correct (also you can notice
> missing primitives, this is the case for primitives that are not supported
> in OpenCL yet, i.e pipes in the goliath.g).
>
> In the document you can see that the current OpenCL implementation is
> slower than the ANSI C code, when running on the same hardware. But to be
> fair, the OpenCL version calculates intersections for the entire ray, and
> some major changes to the rendering loop had to be done to replicate the
> current behaviour of the ANSI C code, where ray intersections and boolean
> evaluations are done in parcial fashion.
>
> Finally, I’ve also committed the changes to the bool_eval() function that
> follows the behaviour of the current bool_eval() function in the trunk.
> Here you can see a comparison between the previous code (bool_eval() using
> the RPN tree) and with the new tree representation: https://
> brlcad.org/wiki/User:Marco-domingues/GSoC17/Log#10_August
>
> Cheers!
> Marco
>
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> BRL-CAD Developer mailing list
> brlcad-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/brlcad-devel
>
>


-- 
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal

=== hardware ===
$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 16
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 1
cpu cores       : 4
apicid          : 17
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 2
cpu cores       : 4
apicid          : 18
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 19
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 4
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 4
cpu cores       : 4
apicid          : 20
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 5
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 5
cpu cores       : 4
apicid          : 21
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 6
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 6
cpu cores       : 4
apicid          : 22
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

processor       : 7
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 2
model name      : AMD FX(tm)-8350 Eight-Core Processor
stepping        : 0
microcode       : 0x600084f
cpu MHz         : 4013.421
cache size      : 2048 KB
physical id     : 0
siblings        : 8
core id         : 7
cpu cores       : 4
apicid          : 23
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bugs            : fxsave_leak sysret_ss_attrs
bogomips        : 8026.84
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

$ clinfo
Number of platforms:                             2
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.2 CUDA 8.0.0
  Platform Name:                                 NVIDIA CUDA
  Platform Vendor:                               NVIDIA Corporation
  Platform Extensions:                           
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing 
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 
cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.2 AMD-APP (1445.5)
  Platform Name:                                 AMD Accelerated Parallel 
Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd 
cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa 


  Platform Name:                                 NVIDIA CUDA
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     10deh
  Max compute units:                             14
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           64
  Max work group size:                           1024
  Preferred vector width char:                   1
  Preferred vector width short:                  1
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      1
  Native vector width short:                     1
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           875Mhz
  Address bits:                                  64
  Max memory allocation:                         1594343424
  Image support:                                 Yes
  Max number of images read arguments:           256
  Max number of images write arguments:          16
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            4096
  Max image 3D height:                           4096
  Max image 3D depth:                            4096
  Max samplers within kernel:                    32
  Max size of kernel argument:                   4352
  Alignment (bits) of base address:              4096
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               128
  Cache size:                                    229376
  Global memory size:                            6377373696
  Constant buffer size:                          65536
  Max number of constant args:                   9
  Local memory type:                             Scratchpad
  Local memory size:                             49152
  Kernel Preferred work group size multiple:     32
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1000
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:                                
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue properties:                              
    Out-of-Order:                                Yes
    Profiling :                                  Yes
  Platform ID:                                   0x0000000000d09110
  Name:                                          GeForce GTX TITAN
  Vendor:                                        NVIDIA Corporation
  Device OpenCL C version:                       OpenCL C 1.2 
  Driver version:                                375.66
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 CUDA
  Extensions:                                    
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics 
cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing 
cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 
cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer
...

===

az35,el25

=== operators.g ===
mged> e operators

mged> rt -s1024
SHOT: cpu = 0.872 sec, elapsed = 0.12597 sec
    parent: 0.8user 0.0sys 0:00real 725% 0i+0d 33188maxrss 0+77pf 73+82csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1182, #free=0, #realloc=10 (1182 retained)
1518991 solid/ray intersections: 425233 hits + 1093758 miss
pruned 28.0%:  410025 model RPP, 793343 dups skipped, 223310 solid RPP
Frame  0:    1048576 pixels in      0.11 sec =   9619963.30 pixels/sec
Frame  0:     900096 rays   in      0.11 sec =   8257761.47 rays/sec (RTFM)
Frame  0:     900096 rays   in      0.87 sec =   1032220.18 rays/CPU_sec
Frame  0:     900096 rays   in      0.13 sec =   7145320.31 rays/sec (wallclock)

mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.048 sec, elapsed = 0.062195 sec
    parent: 0.0user 0.0sys 0:00real 83% 0i+0d 134990maxrss 0+2962pf 7+0csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=11 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.01 sec = 174762666.67 pixels/sec
Frame  0:          0 rays   in      0.01 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.05 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.06 sec =         0.00 rays/sec (wallclock)


=== boolean_ops.g ===
mged> e all

mged> rt -s1024
SHOT: cpu = 0.764 sec, elapsed = 0.1073 sec
    parent: 0.7user 0.0sys 0:00real 760% 0i+0d 33152maxrss 0+107pf 101+207csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1654, #free=0, #realloc=10 (1654 retained)
947527 solid/ray intersections: 312946 hits + 634581 miss
pruned 33.0%:  317291 model RPP, 676926 dups skipped, 97941 solid RPP
Frame  0:    1048576 pixels in      0.10 sec =  10979853.40 pixels/sec
Frame  0:     916480 rays   in      0.10 sec =   9596649.21 rays/sec (RTFM)
Frame  0:     916480 rays   in      0.76 sec =   1199581.15 rays/CPU_sec
Frame  0:     916480 rays   in      0.11 sec =   8541286.11 rays/sec (wallclock

mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.032 sec, elapsed = 0.05063 sec
    parent: 0.0user 0.0sys 0:00real 80% 0i+0d 136640maxrss 0+1050pf 0+0csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=11 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.00 sec = 262144000.00 pixels/sec
Frame  0:          0 rays   in      0.00 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.03 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.05 sec =         0.00 rays/sec (wallclock)


=== truck.g ===
mged> e g4

mged> rt -s1024
SHOT: cpu = 0.524 sec, elapsed = 0.071049 sec
    parent: 0.5user 0.0sys 0:00real 742% 0i+0d 33646maxrss 0+87pf 59+71csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1296, #free=0, #realloc=12 (1296 retained)
237205 solid/ray intersections: 119538 hits + 117667 miss
pruned 50.4%:  833005 model RPP, 138816 dups skipped, 44030 solid RPP
Frame  0:    1048576 pixels in      0.07 sec =  16008793.89 pixels/sec
Frame  0:     904192 rays   in      0.07 sec =  13804458.02 rays/sec (RTFM)
Frame  0:     904192 rays   in      0.52 sec =   1725557.25 rays/CPU_sec
Frame  0:     904192 rays   in      0.07 sec =  12726315.64 rays/sec (wallclock)

mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.048 sec, elapsed = 0.068098 sec
    parent: 0.0user 0.0sys 0:00real 100% 0i+0d 147490maxrss 0+2574pf 8+0csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.01 sec = 174762666.67 pixels/sec
Frame  0:          0 rays   in      0.01 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.05 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.07 sec =         0.00 rays/sec (wallclock)


=== tank_car.g ===
mged> e tank_car

mged> rt -s1024
SHOT: cpu = 1.152 sec, elapsed = 0.17831 sec
    parent: 1.1user 0.0sys 0:00real 676% 0i+0d 91430maxrss 0+190pf 38+157csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=1748, #free=0, #realloc=120 (1748 retained)
1038921 solid/ray intersections: 479173 hits + 559748 miss
pruned 46.1%:  904065 model RPP, 1481105 dups skipped, 386886 solid RPP
Frame  0:    1048576 pixels in      0.14 sec =   7281777.78 pixels/sec
Frame  0:    1047719 rays   in      0.14 sec =   7275826.39 rays/sec (RTFM)
Frame  0:    1047719 rays   in      1.15 sec =    909478.30 rays/CPU_sec
Frame  0:    1047719 rays   in      0.18 sec =   5875828.61 rays/sec (wallclock)

mged> rt -z1 -l5 -s1024 
SHOT: opencl
SHOT: cpu = 0.084 sec, elapsed = 0.136295 sec
    parent: 0.0user 0.0sys 0:00real 100% 0i+0d 143506maxrss 0+3079pf 3+0csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.01 sec =  99864380.95 pixels/sec
Frame  0:          0 rays   in      0.01 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.08 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.14 sec =         0.00 rays/sec (wallclock)


=== goliath.g ==
mged> e Goliath.c

mged> rt -s1024
SHOT: cpu = 2.548 sec, elapsed = 0.389415 sec
    parent: 2.5user 0.0sys 0:00real 668% 0i+0d 89572maxrss 0+244pf 98+414csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=2877, #free=0, #realloc=114 (2877 retained)
3524016 solid/ray intersections: 1567527 hits + 1956489 miss
pruned 44.5%:  727441 model RPP, 3035766 dups skipped, 1651079 solid RPP
Frame  0:    1048576 pixels in      0.32 sec =   3292232.34 pixels/sec
Frame  0:    1246392 rays   in      0.32 sec =   3913318.68 rays/sec (RTFM)
Frame  0:    1246392 rays   in      2.55 sec =    489164.84 rays/CPU_sec
Frame  0:    1246392 rays   in      0.39 sec =   3200677.94 rays/sec (wallclock)

mged> rt -z1 -l5 -s1024 
SHOT: opencl
SHOT: cpu = 0.232 sec, elapsed = 0.355876 sec
    parent: 0.2user 0.1sys 0:00real 97% 0i+0d 324494maxrss 0+2806pf 6+2csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.03 sec =  36157793.10 pixels/sec
Frame  0:          0 rays   in      0.03 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.23 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.36 sec =         0.00 rays/sec (wallclock)


=== havoc.g ===
mged> e havoc

mged> rt -s1024 
SHOT: cpu = 5.188 sec, elapsed = 0.657286 sec
    parent: 5.1user 0.0sys 0:00real 784% 0i+0d 35446maxrss 0+232pf 58+52csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=2243, #free=0, #realloc=67 (2243 retained)
1719392 solid/ray intersections: 500752 hits + 1218640 miss
pruned 29.1%:  495126 model RPP, 6854415 dups skipped, 1175750 solid RPP
Frame  0:    1048576 pixels in      0.65 sec =   1616925.21 pixels/sec
Frame  0:     935500 rays   in      0.65 sec =   1442559.75 rays/sec (RTFM)
Frame  0:     935500 rays   in      5.19 sec =    180319.97 rays/CPU_sec
Frame  0:     935500 rays   in      0.66 sec =   1423276.93 rays/sec (wallclock)

mged> rt -z1 -l5 -s1024
SHOT: opencl
SHOT: cpu = 0.404 sec, elapsed = 0.538892 sec
    parent: 0.4user 0.1sys 0:00real 101% 0i+0d 183538maxrss 0+3788pf 7+1csw
  children: 0.0user 0.0sys 0:00real 0% 0i+0d 0maxrss 0+0pf 0+0csw
Additional #malloc=68, #free=0, #realloc=12 (68 retained)
0 solid/ray intersections: 0 hits + 0 miss
pruned 100.0%:  0 model RPP, 0 dups skipped, 0 solid RPP
Frame  0:    1048576 pixels in      0.05 sec =  20763881.19 pixels/sec
Frame  0:          0 rays   in      0.05 sec =         0.00 rays/sec (RTFM)
Frame  0:          0 rays   in      0.40 sec =         0.00 rays/CPU_sec
Frame  0:          0 rays   in      0.54 sec =         0.00 rays/sec (wallclock)

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Re: [brlcad-devel] bool_eval()

Reply via email to