On Sat, Jul 29, 2017 at 2:39 AM, Christopher Sean Morrison <brl...@mac.com>
wrote:

> Rewrites of the goto-happy C version to have fewer jumps has been
> attempted many times, all resulting in slower code or different (incorrect)
> results.  Of course none of them really had coherency in mind.
>

Well the major issue with the code I wrote is that it always evaluates the
entire expression tree. If can't, like, short-circuit things in an
INTERSECT when one of the sides is known FALSE for example. Fix that and
the performance should go back up. The other questions are: does it need
more than a pointer to the parent and the sibling, and does rewriting XOR
as GUARD and XNOP plus SUBTRACT as NOT improve performance or this just a
side of a legacy code path? The tree rewrites complicate the code a lot.

I originally wrote the tree in RPN format because it was more compact.
However we can just as well make a linear infix tree that would be easily
able to short-circuit operators. The operator rewriting would be more
involved but I think it can be done.
Also if we really want to we can just use loops and branches instead of the
gotos we can't use in OpenCL...
i.e. *exactly* replicate the existing code... Like:

=== bool_eval()
stack:
  switch () {
  case OP_NOP:
  case OP_SOLID:
    goto pop;
  case OP_UNION:
  case OP_INTERSECT:
  case OP_SUBTRACT:
  case OP_XOR:
    goto stack;
  default:
    return TRUE;
  }
pop:
  if ()
    return ret;
  switch () {
  case OP_SOLID:
    return TRUE;
  case OP_UNION:
  case OP_INTERSECT:
  case OP_SUBTRACT:
    if ()
      goto pop;
    else
      goto stack;
  case OP_NOT:
    goto pop;
  case OP_XOR:
    goto stack;
  case OP_GUARD:
    if ()
      return -1;
    goto pop;
  case OP_XNOP:
    goto pop;
  default:
    return TRUE;
  }
===

becomes:

=== bool_eval() without gotos
for (;;) {
  for (;;) {
    switch () {
    case OP_NOP:
    case OP_SOLID:
      break;
    case OP_UNION:
    case OP_INTERSECT:
    case OP_SUBTRACT:
    case OP_XOR:
      continue;
    default:
      return TRUE;
    }
    break;
  }

  for (;;) {
    if ()
      return ret;
    switch () {
    case OP_SOLID:
      return TRUE;
    case OP_UNION:
    case OP_INTERSECT:
    case OP_SUBTRACT:
      if ()
        continue;
      else
        break;
    case OP_NOT:
      continue;
    case OP_XOR:
      break;
    case OP_GUARD:
      if ()
        return -1;
      continue;
    case OP_XNOP:
      continue;
    default:
      return TRUE;
    }
    break;
  }
}
===

This should be 100% equivalent in terms of branches and any half decent
compiler should be able to do optimizations to short circuit the branches
so it should have the same speed. In fact I might just make a patch to do
exactly that. :-)

One interesting one was a rewrite in java, which circumvented the entire
> approach with a simpler direct evaluation of the Boolean expression (see
> Tree::evaluate() and the Partition class operators):
>
> https://svn.code.sourceforge.net/p/brlcad/code/jbrlcad/
> trunk/src/main/java/org/brlcad/geometry/Tree.java
> https://svn.code.sourceforge.net/p/brlcad/code/jbrlcad/
> trunk/src/main/java/org/brlcad/geometry/Partition.java
>
> This not only works but fixed a bug in the C boolweave code where wrong
> normals might get used because it makes decisions without regard for the
> CSG operation.  I’ve always thought it would be really interesting to
> re-transcode the Java version back to C to see what the performance impact
> would be.  Almost certainly more incoherent, but probably much easier to
> optimize (and maintain).
>

Yeah but it seems to use a lot of dynamic memory allocation which is a big
issue on GPU programming.


> Figures. It would be too good to be true otherwise. We will have to
> prototype a new tree storage structure and bool_eval() in ANSI C.
> Then port that over to OpenCL. This is a chance to make a contribution to
> the state of the art. We'll precompute all the jumps and store them in the
> tree.
>
>
> ++1
>
> For that table, it would be informative to see a comparison with rt -l5
> -s1024 (i.e., without -z1).  What’s the distinction of the first two
> columns (e.g., havoc 1.208s vs 2.198s)?  That is, what change resulted in
> havoc being 1.9x faster??  That’s unexpected.
>

Ah... well the -l5 is a hack to enable multi-hit mode in the OpenCL mode (I
didn't want to add another command line switch) but this is the default in
ANSI C mode. So please ignore it it's supposed to be the same lighting
mode. Or would be if the shader and lighting code was finished. In fact now
I remember this could have been in the -z switch instead... Hmmm...


> Also, what’s the VGR number on your test machine?  My laptop (~7K VGR)
> renders havoc -s1024 in 0.39s (2.4M rays/s) …
>

-- 
Vasco Alexandre da Silva Costa
PhD in Computer Engineering (Computer Graphics)
Instituto Superior Técnico/University of Lisbon, Portugal
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
BRL-CAD Developer mailing list
brlcad-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/brlcad-devel

Reply via email to