Greets,

Boilerplater is currently implemented in KS using the design that Dave and I hashed out here. The post below explores a potential mod to that design.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


Begin forwarded message:

From: Marvin Humphrey <[EMAIL PROTECTED]>
Date: October 27, 2007 7:43:35 PM PDT
To: KinoSearch discussion forum <[EMAIL PROTECTED]>
Subject: [KinoSearch] Towards a stable C API... via indirect dispatch
Reply-To: KinoSearch discussion forum <[EMAIL PROTECTED]>

Greets,

In order to present a useful public C API for KS, we need to make method calls available -- not just functions. But in KS, inheritance is implemented using vtables -- structs with function pointer members -- and once those vtables are part of the public API, you can't change the vtable struct layout without wrecking binary compatibility. Here is an excellent explanation of the problem:

  <http://www.usenix.org/events/javavm02/yu/yu_html/node5.html>

Freezing the vtables would severely cramp our ability to develop KS. However, if we are unable to guarantee binary compatibility, outside developers will be severely limited in their ability to extend KS from C, so I've been looking for a way around this problem for a while... Happily, I think I've found one:

  "Supporting Binary Compatibility with Static Compilation"[1]
  Dachuan Yu, Zhong Shao, and Valery Trifonov
  <http://www.usenix.org/events/javavm02/yu/yu_html/index.html>

Right now, KS virtual method invocations look something like this:

  object->vtable->method_name(object)

Here's the actual pound-define for KinoSearch::Index::Term's destroy () method, which overrides a method inherited from KinoSearch::Util::Obj:

  #define Kino_Term_Destroy(self) \
      (self)->_->destroy((kino_Obj*)self)

self->_ is the vtable; "destroy" is a member.

Under the indirect dispatch system, the vtable becomes an array of function pointers rather than a struct with function pointer members, and method invocation changes to something like this:

  object->vtable[offset](object)

Here's how an actual pound-define might look:

  #define Kino_Term_Destroy(self) \
((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET]) ((kino_Obj*)self))

What this allows us to do is define the vtable layout and the offsets dynamically during a bootstrap operation. The payoff is that a method macro so defined retains binary compatibility even as the composition of the vtable changes with subsequent releases.

Stated another way: if we make the layout of the current vtables part of the public API, externally compiled code will assume that a method like "destroy" is located at a fixed location in the vtable -- and if the layout of the vtable changes, the externally compiled code will jump into the wrong method. (BAD!) However, if we make that offset a variable and set it at runtime, the externally compiled code will always find the correct method to jump into.

Therefore, someone could write another XS library extending KS, and upgrading KS itself wouldn't cause breakage.

There's a cost in CPU cycles for this flexibility: one extra array look-up operation. However, GCJ uses this design, and the performance penalty is apparently only around 2% on average:

  <http://www.usenix.org/events/javavm02/yu/yu_html/node29.html>

That might seem mild, but it actually makes sense to me, at least. On a modern, pipelining processor chip, that extra op just isn't a big deal. When I changed InStream and OutStream into "final" classes, so that heavily used methods like OutStream_Write_VInt() resolved directly to function addresses and no longer needed to be resolved via vtable double dereference, the benchmark barely budged:

  <http://xrl.us/7rty> (Link to mail-archives.apache.org)

A note about type safety:

The array of function pointers will have to be implemented as an array of void*, since we won't know which functions go where in the vtable until runtime. This would seem to be a drawback, since in theory we lose a certain amount of compile-time checking. However, we aren't really losing much, if anything. The current system doesn't perform real type checking; the first argument is always cast (in this example, to kino_Obj*):

  #define Kino_Term_Destroy(self) \
      (self)->_->destroy((kino_Obj*)self)

However, at present, there *will* be a compile time error if the vtable doesn't contain a method with the appropriate name:

   /* compile-time error */
   kino_Obj_destroy_t destroy_meth = self->_->destro;

We will continue to enjoy a similar level of safety because the name of the offset variable will have to be resolved by the dynamic loader. Say we remove the Kino_Term_Destroy method... then this code will crash at run-time, because the kino_Term_destroy_OFFSET symbol cannot be resolved:

   destroy_meth = self->_[kino_Term_destroy_OFFSET];

Of course a run-time crash would be bad -- but that just means that we can't redact public methods -- which we wouldn't be doing anyway.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

[1] The technique I've out differs slightly from what's described in the paper. For us the offsets can be stored in individual variables, but Yu et al put them in an "otable" array which is initialized by the Java class loader.


_______________________________________________
KinoSearch mailing list
[EMAIL PROTECTED]
http://www.rectangular.com/mailman/listinfo/kinosearch






Reply via email to