On Sat, Jul 29, 2017 at 2:39 AM, Christopher Sean Morrison <brl...@mac.com>
wrote:
> Rewrites of the goto-happy C version to have fewer jumps has been
> attempted many times, all resulting in slower code or different (incorrect)
> results. Of course none of them really had coherency in mind.
>
Well the major issue with the code I wrote is that it always evaluates the
entire expression tree. If can't, like, short-circuit things in an
INTERSECT when one of the sides is known FALSE for example. Fix that and
the performance should go back up. The other questions are: does it need
more than a pointer to the parent and the sibling, and does rewriting XOR
as GUARD and XNOP plus SUBTRACT as NOT improve performance or this just a
side of a legacy code path? The tree rewrites complicate the code a lot.
I originally wrote the tree in RPN format because it was more compact.
However we can just as well make a linear infix tree that would be easily
able to short-circuit operators. The operator rewriting would be more
involved but I think it can be done.
Also if we really want to we can just use loops and branches instead of the
gotos we can't use in OpenCL...
i.e. *exactly* replicate the existing code... Like:
=== bool_eval()
stack:
switch () {
case OP_NOP:
case OP_SOLID:
goto pop;
case OP_UNION:
case OP_INTERSECT:
case OP_SUBTRACT:
case OP_XOR:
goto stack;
default:
return TRUE;
}
pop:
if ()
return ret;
switch () {
case OP_SOLID:
return TRUE;
case OP_UNION:
case OP_INTERSECT:
case OP_SUBTRACT:
if ()
goto pop;
else
goto stack;
case OP_NOT:
goto pop;
case OP_XOR:
goto stack;
case OP_GUARD:
if ()
return -1;
goto pop;
case OP_XNOP:
goto pop;
default:
return TRUE;
}
===
becomes:
=== bool_eval() without gotos
for (;;) {
for (;;) {
switch () {
case OP_NOP:
case OP_SOLID:
break;
case OP_UNION:
case OP_INTERSECT:
case OP_SUBTRACT:
case OP_XOR:
continue;
default:
return TRUE;
}
break;
}
for (;;) {
if ()
return ret;
switch () {
case OP_SOLID:
return TRUE;
case OP_UNION:
case OP_INTERSECT:
case OP_SUBTRACT:
if ()
continue;
else
break;
case OP_NOT:
continue;
case OP_XOR:
break;
case OP_GUARD:
if ()
return -1;
continue;
case OP_XNOP:
continue;
default:
return TRUE;
}
break;
}
}
===
This should be 100% equivalent in terms of branches and any half decent
compiler should be able to do optimizations to short circuit the branches
so it should have the same speed. In fact I might just make a patch to do
exactly that. :-)
One interesting one was a rewrite in java, which circumvented the entire
> approach with a simpler direct evaluation of the Boolean expression (see
> Tree::evaluate() and the Partition class operators):
>
> https://svn.code.sourceforge.net/p/brlcad/code/jbrlcad/
> trunk/src/main/java/org/brlcad/geometry/Tree.java
> https://svn.code.sourceforge.net/p/brlcad/code/jbrlcad/
> trunk/src/main/java/org/brlcad/geometry/Partition.java
>
> This not only works but fixed a bug in the C boolweave code where wrong
> normals might get used because it makes decisions without regard for the
> CSG operation. I’ve always thought it would be really interesting to
> re-transcode the Java version back to C to see what the performance impact
> would be. Almost certainly more incoherent, but probably much easier to
> optimize (and maintain).
>
Yeah but it seems to use a lot of dynamic memory allocation which is a big
issue on GPU programming.
> Figures. It would be too good to be true otherwise. We will have to
> prototype a new tree storage structure and bool_eval() in ANSI C.
> Then port that over to OpenCL. This is a chance to make a contribution to
> the state of the art. We'll precompute all the jumps and store them in the
> tree.
>
>
> ++1
>
> For that table, it would be informative to see a comparison with rt -l5
> -s1024 (i.e., without -z1). What’s the distinction of the first two
> columns (e.g., havoc 1.208s vs 2.198s)? That is, what change resulted in
> havoc being 1.9x faster?? That’s unexpected.
>
Ah... well the -l5 is a hack to enable multi-hit mode in the OpenCL mode (I
didn't want to add another command line switch) but this is the default in
ANSI C mode. So please ignore it it's supposed to be the same lighting
mode. Or would be if the shader and lighting code was finished. In fact now
I remember this could have been in the -z switch instead... Hmmm...
> Also, what’s the VGR number on your test machine? My laptop (~7K VGR)
> renders havoc -s1024 in 0.39s (2.4M rays/s) …
>
--
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel