There are two performance issues:
1. 1!:3xxx calls vs cd calls
2. rendering with glaRC/glaSwapBuffers vs pixel map and hardware accelerators

The 1st may have introduced performance penalties. If so, they can and will (eventually) be resoloved by making any necessary changes to cd. The 11!:3xxx calls went from j.dll code to j.exe code where the 3xxx is handled by a large select statement and results in same data conversion and OpenGL call as is done by cd. The main difference is 11!:3xxx worked with unboxed lists whereas cd needs to box each. A secondary difference is that currently cd needs to look up the OpenGL function name in the OpenGL API on every call and with the very large API this may be a non-trivial overhead for trivial functions. Appropriate changes to cd will make it as fast or faster than 11!:3xxx. And these changes will benefit all other cd users.

The 2nd is more problematic and may require changes and tighter integration in windows between OpenGL rendering and 'normal' isigraph rendering. I assumed that since the 11!:3xxx OpenGL rendering was double buffered that it was essentially to a pixel map and would not be any different than what is done in the new cd support code. Experience and feedback such as yours will show what needs to be done. The worst case is that we add back in to windows the 11!:3xxx calls for glaRC, glaMakeCurrent, and glaSwapBuffers so the rendering is done the same as before rather than to a pixel buffer.

I am confident the steps we have taken are in the right direction. There may well be additional steps required to get back the same or better level of performance. Right now our focus is on functionality and cross platform support. When that has settled we can focus on performance.

More detailed timing comparisons such as you have done can perhaps clarify where the bottlenecks are, particularly with regards to hardware acceleration.

----- Original Message ----- From: "Oleg Kobchenko" <[EMAIL PROTECTED]>
To: "Beta forum" <[email protected]>
Sent: Tuesday, April 04, 2006 11:34 PM
Subject: Re: [Jbeta] 601 j beta for windows now available.


OpenGL looks better in Java. However, performance
is both in Windows and Java is like it used to be
in Java.

More details below. But the bottom line is that
still it would be hard to make cd and DC/RC perform
the way it used to under Windows. OpenGL was a good
showcase how fast J can perform in critical conditions.

The call for generalized OpenGL was to be extensible
with more OpenGL functions. One possibility would be
to leave the implementation at it was for
most of the functions, especially the drawing primitives,
which are called frequently. And then allow to call
other API functions, such as configuration.

I'm not sure if this is worth it, but would consider it if presented with
arguments.

In "cube" OpenGL Lab Demo with window size
about 1000x1000, I compaired

  smoutput 6!:2 'drawbox $0'

It's ~x3 slower in new Win OpenGL.

drawbox consists only of pure J and OpenGL calls
(now bitmat swapping etc.,) which are different so

  glVertex_jgl3_  NB. j5
11!:3149
  6!:2 'glVertex_jgl3_"1 i.100 4'
0.000495594

  glVertex_jgl3_  NB. j6
[EMAIL PROTECTED](4 = {:@$)
  glVertex3d_jgl3_
'opengl32.dll glVertex3d n d d d' 15!:0"1 <&>

  6!:2 'glVertex_jgl3_"1 i.100 4'
0.00466568

This is ~x10 in pure calls. This is the most
compelling argument that cd cannot match fc call
in performance. And boxing may not have much to do
with it:

  6!:2 'glVertex_jgl3_"1 i.100 4'   NB. j6
0.00428658
  6!:2 'glMatrixMode_jgl3_"0 i.100'
0.00116244



The performance issue may also lie in the way
OpenGL is applied to the window.

The previous implementation of Windows OpenGL
worked with the overlay delegating rendering
and refreshing to the underlying systems, possibly
relying on the GPU support. Whereas it looks like
this implementation redirects OpenGL to a bitmap,
fetches its pixels and then copies them to the
real DC. So there is no way to gain the benefits
of the GPU support.

However, it is possible to draw directly to the
window DC from J code with cd. This gives on 'cube'
demo full screen visually about x2-3 speed gain.

Surely, it's an implementation improvement for Java,
but maybe it would be possible to get Java window peer DC
and also let OpenGL draw directly without bitmap.


NB. ===================================[jzopengl_win32.ijs]====

alloc=: 3 : 0
free''
hdc=: 0{::GetDC hwnd   NB. hwnd: y{2 or from 'create' ctor
pfd=: mpfd''
if. pixfmt=0 do.
 pixfmt_jzopengl_=: >{.ChoosePixelFormat hdc;pfd
end.
r=. a.i.>{.SetPixelFormat hdc;pixfmt;pfd

'pixelformat failed' oglassert r~:0
hglrc=: >{.wglCreateContext hdc
if. (0=hglrc)*.0=ONETIME do.
 ONETIME_jzopengl_=: 1
 alloc y
end.
'context failed' oglassert hglrc
r=. >{.wglMakeCurrent hdc;hglrc
'makecurrent failed' oglassert r~:0
wh=: y
i.0 0
)

free=: 3 : 0
wh=: 0 0
pad=: 0
if. hglrc do.
 wglMakeCurrent 0;0
 wglDeleteContext hglrc
 hglrc=: 0
end.
if. hdc do.
 SwapBuffers hdc
 ValidateRect hwnd;0
 ReleaseDC hwnd;hdc
 hdc=: 0
end.
i.0 0
)

PFD_DOUBLEBUFFER          =:  16b00000001
PFD_DRAW_TO_WINDOW        =:  16b00000004
PFD_DRAW_TO_BITMAP        =:  16b00000008
PFD_SUPPORT_OPENGL        =:  16b00000020

mpfd=: 3 : 0
a=. 40{.40 00 01 00,(PFD_DOUBLEBUFFER+PFD_DRAW_TO_WINDOW+PFD_SUPPORT_OPENGL),00 00 00 00 32
a=. 32 (23})a NB. cDepthBits
_2(3!:4)40{.a{a.
)

SwapBuffers=: 'opengl32.dll wglSwapBuffers i i'&(15!:0)
GetDC=: 'user32 GetDC i i'&(15!:0)
ReleaseDC=: 'user32 ReleaseDC i i i'&(15!:0)
ValidateRect=: 'user32 ValidateRect i i i'&(15!:0)


--- Eric Iverson <[EMAIL PROTECTED]> wrote:

You raise an interesting point I have been thinking about. Currently 15!:0 targets arguments of various types and always requires boxing each. This has never been an issue. However the opengl API has a large class of calls,where
performance may be an issue, that have scalar args of the same type. It
would be fairly easy to introduce a new 15!:x that took a list of scalars.
I'm not sure if this is worth it, but would consider it if presented with
arguments.

----- Original Message ----- From: "Miller, Raul D" <[EMAIL PROTECTED]>
To: "Beta forum" <[email protected]>
Sent: Tuesday, April 04, 2006 6:48 PM
Subject: RE: [Jbeta] 601 j beta for windows now available.


A few more notes, working through the opengl lab:

> ?? (18 of 42) Isigraph Control (ctd) ????????????????????????
> Calls with all scalar args are defined to box each scalar (as
> required by cd) and to be rank 1. This allows calls with a
> simple scalar list and allows application to tables.

Wouldn't unboxed right arguments with 15!:0 be radically more
efficient for these cases?  (Not that efficiency should be
a primary concern, but in this case it also seems like it
should be simpler.)

> ?? (19 of 42) Isigraph Control (ctd) ????????????????????????
...
> modifiers) is often added for the common case. For
> glVertex... the added definition is: glVertex=: glVertex4d

That's not completely accurate.  (Not that I'm complaining,
I'm talking about minor rough edges here, and releasing even
a small piece of software can involve a lot of work and time.)

   glVertex__ogl
[EMAIL PROTECTED](4 = {:@$)

I'd also be inclined to use
[:`[:[EMAIL PROTECTED]({:@$)

> ?? (32 of 42) Cube (ctd) ????????????????????????????????????

Here, I got an error:

error in: jijs_jctrl_fkey

domain error: gluDeleteQuadric
   gluDeleteQuadric obj

Also, when I restarted the lab without creating a fresh J
session, no opengl rendering window would appear (though
the buffer manipulations without a window seemed to work
just find).

When I restarted the lab in a fresh J session, section 32
seemed to work just fine, so I'm suspecting a race condition.

FYI,

--
Raul


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to