Pavel, I understand that native implementation can do actually the same - my point was to propose a way to avoid heavy overhead of JNI transition mentioned earlier by Aleksey.
I just was a bit unfamiliar with LightJNI concept, but now using LightJNI seems to me the best solution. 2007/11/7, Pavel Ozhdikhin <[EMAIL PROTECTED]>: > On 11/7/07, Maksim Ananjev <[EMAIL PROTECTED]> wrote: > > > > Btw, why not to implement api magic that would substitute sqrt(x) by > > hardware sse sqrt instruction? "sqrtpd", afair > > > > It might be faster then JNI call. > > > The native part of the API may have the same native code as the JIT would > produce if it implements sqrt using magics. Let's see our performance on the > test using LightJNI calls. BTW, we can estimate our potential speedup limit > by writing the same micro-benchmark on C using FDLIBM or another sqrt > implementation. > > Thanks, > Pavel > > > Just an idea. I am not sure if such JIT hack looks pretty from the > > point of view of design. > > > > > > 2007/11/7, Egor Pasko <[EMAIL PROTECTED]>: > > > On the 0x387 day of Apache Harmony Aleksey Shipilev wrote: > > > > On 07 Nov 2007 00:27:25 +0300, Egor Pasko <[EMAIL PROTECTED]> > > wrote: > > > > > Vladimir, guess what? :) I actually mixed several things > > > > > altogether. > > > > <giggly>(sigh) As usual, miracle does not happen :) I dreamed to see > > > > software sqrt() implementation that could be faster than hardware one. > > > > </giggly> > > > > > > > > > So, we are left with SSE asm that can be inlined by JIT and AFAI can > > > > > see it is not as fast as HotSpot? Weird :) > > > > No, for now we have just the intrinsic in native code, so we also have > > > > overheads for JNI transition (and it is heavy!), parameter passing, > > > > chains of calls, etc. I believe that sqrt() magic will lead that NBody > > > > performance very close to RI. > > > > > > Ha! I thought, you made it without JNI. Then we have a chance! If we > > > do not get it through magics we still have another chance to implement > > > 1/sqrt() :) > > > > > > > BTW, in my thought this benchmark is like the top of the iceberg > > > > called "FP performance problems". > > > > > > Yes, I know, DRLVM has not been actively optimized for FP calc. > > > > > > -- > > > Egor Pasko > > > > > > > > > > > > > > > -- > > Maksim > > > -- Maksim
