Re: value range propagation for _bitwise_ OR

Don Tue, 13 Apr 2010 09:15:20 -0700

Adam D. Ruppe wrote:

On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:

That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd 
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I 
guess is the reason it's an intrinsic in the first place). I would have 
expected this to be much, much faster than a user function. Anyone care enough 
to check the generated assembly?


The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcode instead of
user opcodes.

It's fast on Intel, slow on AMD. I bet the speed difference comes frominlining max().

Re: value range propagation for _bitwise_ OR

Reply via email to