The maniac plan for world domination

Brian S. Julin Tue, 14 Aug 2001 16:42:32 -0700

Heya GGI wizards, mages, scribes, and lurking critters.

Now that we have pushed the bird out of the nest, it is time to move
on.  As promised, here is a descriptive summary of the "maniac plan
for world domination" that has been bouncing around private mail,
mostly between myself and Christoph.  Consider this an RFC for a core
set of LibGGI extensions, e.g. what will be stored in the "lowlevel"
modules.

First let's start by identifying some of the obstacles that have been
in our way, and have until now prevented a coherent roadmap for the
GGI project to move on into the areas of accelerated 2D/3D, video
overlays, etc.

Obstacle #1: Resource allocation.  Contention between different
extensions for resources offered by a target.  Primarily we are
talking about video card FIFOs and VRAM, but even other targets can
have internal resources that must be kept track of.  Prospective
extension authors are stumped by the lack of a standard way to
find/claim such resources for use by the extension.

Obstacle #2: Combinatorial expansion of the number of required
rendering sublibs.  Having optimized renderers for all bitdepths, and
in the case of crossblits, combinations of source/destination
bitdepths is something that is good, but requires a lot of code.  We
have not developed many tools or organizational methods to mitigate
the volume of the needed code by increasing the reusability of the
code, beyond what was done for the original LibGGI renderers.

Obstacle #3: Depreciation of the value of LibGGI's architectural
advantages.  Target independence and portability are good things, but
we need more selling points where LibGGI is superior in order to draw
in more interested developers.  (Not having Blt+rops and alpha has
hurt us badly, for example.)  We must give LibGGI an architecture
which promises to *beat* the "competition" in the areas of
functionality and speed.

Obstacle #4: Failure to deliver hardware acceleration.  Other graphics
packages are currently providing much more hardware acceleration
support than LibGGI.

Most of this e-mail will deal with Obstacles #2 and #3.  Obstacle #1
is solved by LibGAlloc, which, though far from complete, at least is
to the point where we are confident that it will do what we need it to
do, and what the interface to it will look like.  LibGAlloc docs are
available on the GGI Project webpages, though the code is for the
moment awaiting a medium-level rewrite.  Developers needing specific
features are not likely to benefit immediately from being involved
with LibGAlloc development, and should probably concentrate on writing
renderers instead (there are shortcuts by which we will be able to use
new renderers before LibGAlloc is complete, and gracefully graft
LibGAlloc on afterwards).  Developing for LibGAlloc is more a task for
people who are interested in the LibGGI architecture itself.  There
are a number of interesting sub-projects to be worked on in LibGAlloc,
though, if anyone is looking for some challenging algorithm work.

Obstacle #4 has a simple solution: encourage KGI development, and in
the meantime, bumper-surf off DirectFB, DRI, and DirectX by writing
targets to use them or their drivers.  (To this end, I have committed 
the work I have started on a DirectFB renderer for display-fbdev.)

Now, down to business.  There are three critical libraries that have
been started to some extent: LibBuf, LibBlt, and LibOvl.  A quick
summary:

LibBuf provides raw directbuffer-like access to generic buffers --
these being any kind of stored data for any kind of feature that uses
pixel-like data, e.g. sprite pattern data, Blt sources, Z buffers,
Alpha side-buffers, etc.  More critically, it provides a mechanism to
build compound buffers, e.g. associate a Z buffer with another buffer.
As a special case, it also provides a way to tie buffers such as Z
buffers or Alpha buffers or channels to a visual, activating such
functionality on the visual by overloading the LibGGI primitives.

LibBlt provides an API to raster ops.  A key feature of LibBlt is that
it is designed to think of a Blt object not as a peice of data, but
rather as a process of copying data, and such a process can consist of
multiple operations which can be queued and set in motion.  This is a
basic architectural advantage over other APIs that require invocation
of an API function for each Blt operation.  (Please note, the version
in CVS right now is not up to date, so this is not evident in the
headers.)

LibOvl provides support for positioning various features that follow
the model of moving some sort of area around the screen which either
affects the display of data underneath the area, or causes different
data to be displayed over (or somehow combined with) the data in the
main framebuffer.  This includes YUV viewports, hw video sources, and
sprites.

To address Obstacles #2 and #3, a concept called a "batchop" has been
shopped around a bit to a few people.  Batchops aim to address the
following problems in an effort to mitigate obstacles #2 and #3.  The
batchop URL given below only deals with the generic portion (non GGI
specific) of batchops, so before I offer it, lets talk about them a
bit from a GGI perspective.  Batchops offer to solve to some degree
the following design problems:

1) The code we get when we write optimized algorithms for a particular
purpose will often not be reusable, because of many hard-coded values,
assumptions about the size/format of input and output data, or
behaviors which must vary slightly.  Making what are now constant
values into parameters of a function call, or providing flags to alter
behavior, in order to make the function more generic, is an
unattractive solution for various reasons (I won't delve, for
brevity.)  Often it is the case that an algorithm will be just as
fast, or only marginally slower, if some of these hard-coded values or
behaviors can be altered by the user/user-space-library.  In this
respect, batchops offer a way for simple differences in format of
input or output data, or minor differences in input or output
processing to be glossed over a bit, easing the amount of duplicate
code to maintain, and without drastically decreasing efficiency.

2) LibGGI's rather aggressive subdivision of operations into very
basic primitives is fine for the limited set of graphics operations it
supports.  However, this results in Obstacle #2 when we bring the
large number of operations in the 2D arena into the mix.  Batchops
provide a nexus where the user can throw their data and commands, and
the target can build pipelines to drain them out.  They break open the
internal structure of renderers to make them more free-form, so
targets can divide up their code in the most reusable fashion, on a
per-target or per-target-family basis, rather then being obligated to
fill a function table of drawops.

3) Often we will want to send a lot of Blts, 3dops, or even regular
GGI ops in a row.  Sometimes we will want to vary the coordinates from
one operation in the batchop to the next, but keep the same
source/color/whatever.  Sometimes we will want to change the source as
well, or other stuff like Z slope.  If we were programming "to the
metal", we'd load the values that don't change into the drawing
engine, and then fill an array with the varying parameters and send
those to the drawing engine, one set for each op.  Using anything but
a simple array to contain the values consumes space/CPU (as an aside,
in some cases linked lists are just as efficient CPU-wise);
especially, calling a function for each operation is way too
expensive.  Batchops offer a compromise somewhere between the
to-the-metal approach, and the full-blown API approach which speed
freaks find to be bloated.

4) Many graphics chipsets allow the batch execution of commands, or
the batch loading of data into drawing engine registers to rapidly
draw multiple objects without CPU assistance.  Some can even DMA the
commands from the system or take them from video RAM.  But when doing
this, it is often necessary to interleave commands with data.  Since
batchops encourage the use of a different model, where an object
represents a process which is repeated N times while paging through
arrays (as in LibBlt), it more easily meshes with the accel FIFO
"modus operandus".  In addition, because of certain features of the
design of batchops, it will often be possible to reuse a generic
batchop function such that only a data structure is needed to refit a
GGI drawing operation to a new command queue format.

In order to glue GGI+extensions to batchops, we will define a set of
opcodes which represent drawing primitives.  In addition, we will
define a set of "parmtypes" which are commonly used parameters like
"top left corner coord" or "width" or "colorkey".  There will be an
in-header registration system for these opcodes and parmtypes to
prevent collision between extensions, unify them across extensions,
and provide for reserved areas for private/in-development extensions.

With the above in mind, I think it is now time to mention that
preemptive documentation for the batchop concept is available at
http://mojo.calyx.net/~bri/projects/GGI/galloc/libmmutils-bo.html
You'll want to read that before continuing with this e-mail.

Now, we are going to describe a very crude API and extension design
using batchops, and trace the flow through it to the back end.  There
are better ways to design such APIs, and we'll get into that below,
but it's simpler to explain if we do it wrongly :-) Suppose you wanted
a function that drew many boxes in an extension API, e.g.

ggiDrawLotsOfBoxes(vis, numboxes, ggi_color *color, 
                   int *x, int *y, int *w, int *h);

When the extension containing this function is attached, two batchops
to support calls to this function are created.  One is the source
batchop.  This batchop has one constant batchparm which contains an
"opcode" as mentioned above.  The other batchparms in the source
batchop are the color and coordinates.

The second batchop is gotten from the target through an internal-use
API function.  When the target sublib is loaded, it is matched to the
source batchop, so that the match operation does not need to be
performed each time the API function is called.

When the API function is called, the block addresses inside the source
batchop are set to point to the parameters sent by the user, the
batchop counters are initialized, and the bo_go function is set in
motion to draw the boxes.  After it is done the API function returns.

Now, let's go inside the target.  When the extension is first
attached, it will iterate through any loaded target dl's twisting
doorknobs to find an extension that will give it a batchop for a given
source batchop.  For each dl, a __create_batchop function inside is
called.  The extension target has the choice at this point to either
implement a generic destination batchop which supports a large group
of opcodes, or break out single opcodes or subsets into their own
implementation.  This is done to offer flexibility in deciding how to
modularize the rendering code.  A new batchop is created and handed
back to the extension API, which matches the source to the
destination, and an optimized rendering path in thus found.

Suppose our back-end is KGI.  In this case, the renderer need only
translate the data in the source batchparm blocks into the format
needed for the KGI accel queue, or even do direct register loads if
there happens to be very well behaved hardware present.  This will be
a very efficient operation indeed.

We said above that this implementation is crude.  This is because the
way in which batchops are essentially multiplexed by opcode would
allow a target to offer a batchop capable of mixing primitives in the
same, well, what amounts to a "command queue."  A more advanced API
would not have a distinct batchop for each API function, but rather
would create batchops that can mix primitives, and offer the user an
API function to add primitives to the batchop, rather than execute
them on the spot, and another to cause the execution of the queued
commands to be started.  Allowing the user to create any number of
source batchops for a sublib also allows a more asyncronous API to be
offered.  This is the way LibBlt's proposed API does it.

I've accumulated some of the PM Cristoph and I had on the LibBlt and
LibBuf APIs and posted it at
http://mojo.calyx.net/~bri/projects/GGI/pm_snips.txt

Finally, looking down the road, we have the option of providing a
mechanism by which all available batchops can be collected by LibGGI's
extension mechanism, and put into a structure that is traversed by a
fallback mechanism to deal with batchops containing mutiple opcodes.
This is all much like getapi, and will probably use getapi, but it
allows shortcutting the function call overhead when the target elects
to merge batchops for different opcodes.

Note that the way this is done means that target-sublib code can serve
more than one extension, if it likes.  We just alias ext1-fbdev to the
same .so as ext2-fbdev and voila.

Nice plan?  I hope so, and welcome comments.  But, for those chomping
at the bit, where to code from here?  The workload can be divided up
in such a way that different parts can be worked on independently.
Mainly, and most importantly, there is no reason why default rendering
code cannot be written right away.  Development of the new lowlevel
library APIs, batchop infrastructure, and LibGAlloc won't cause this
code to need major reworking, as long as a few considerations are
taken into account.  For example, coding Alpha-aware LibGGI primitive
renderers can proceed using the same structure as the current ones.

The first consideration is to put a good amount of thought about how
to sub-divide the drawing operation family you decide to work on, with
a focus on reusing code between mostly similar operations.

The second consideration has to do with what happens when we switch to
new style batchop renderers -- each op will not be in its own
self-contained function.  So, the code inside the renderers needs to
be easily moved from the way it will look at first:

linear_r8g8b8a8_hline (/* args */) {
    /* code */
}

To:

 /* args loaded from batchop */
 switch(opcode) {
 /* ... */
 case GGI_OP_HLINE:
   /* code */
   break;
 /* ... */
 }

Basically, it will help if the author chooses the same name for any
temporary variables/args that have similar meaning, wherever possible,
and do not use the args as lvalues, since they may in fact refer
*directly* to user data which should not be altered.

--
Brian
The maniac plan for world domination

Reply via email to