Thank You for the explanation!

But my tests has different results:

void* SSE_sobelRow(ubyte* src, ubyte* dst, size_t srcStride){ asm{
  push RDI;

mov RAX, 0; mov RDX, 0; mov RCX, 0; //clear 'parameter' registers

  mov RAX, src;
  mov RDI, dst;

  //gen
  movups XMM0,[RAX];
  movaps XMM1,XMM0;
  pslldq XMM0,1;
  movaps XMM2,XMM1;
  psrldq XMM1,1;
  pavgb XMM1,XMM0;
  pavgb XMM1,XMM2;
  movups [RDI],XMM1;
  //gen end

  pop RDI;
}}

When I clear those volatile regs that are used for register calling, I'm still able to get good results. However when I put "mov [RBP+8], 0" into the code it generates an access violation, so this is why I think parameters are on the stack.

What I'm really unsire is that the registers I HAVE TO save in my asm routine. Currently I think I only able to trash the contents of RAX, RCX, RDX, XMM0..XMM5 based on the Microsoft calling model. But I'm not sure what's the actual case with LDC2 Win64.

If my code is surrounded by SSE the optimizations of the LDC2 compiler, and I can't satisfy the requirements, I will have random errors in the future. I better avoid those.

On the 32bit target the rule is simpe: you could do with all the XMM regs and a,c,d what you want. Now at 64bit I'm quite unsure. :S

Reply via email to