Towards a stable C API... via indirect dispatch

Marvin Humphrey Sun, 28 Oct 2007 07:22:59 -0800

Greets,

Boilerplater is currently implemented in KS using the design thatDave and I hashed out here. The post below explores a potential modto that design.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Begin forwarded message:

From: Marvin Humphrey <[EMAIL PROTECTED]>
Date: October 27, 2007 7:43:35 PM PDT
To: KinoSearch discussion forum <[EMAIL PROTECTED]>
Subject: [KinoSearch] Towards a stable C API... via indirect dispatch
Reply-To: KinoSearch discussion forum <[EMAIL PROTECTED]>

Greets,

In order to present a useful public C API for KS, we need to makemethod calls available -- not just functions. But in KS, inheritanceis implemented using vtables -- structs with function pointer members-- and once those vtables are part of the public API, you can'tchange the vtable struct layout without wrecking binarycompatibility. Here is an excellent explanation of the problem:


  <http://www.usenix.org/events/javavm02/yu/yu_html/node5.html>

Freezing the vtables would severely cramp our ability to develop KS.However, if we are unable to guarantee binary compatibility, outsidedevelopers will be severely limited in their ability to extend KSfrom C, so I've been looking for a way around this problem for awhile... Happily, I think I've found one:


  "Supporting Binary Compatibility with Static Compilation"[1]
  Dachuan Yu, Zhong Shao, and Valery Trifonov
  <http://www.usenix.org/events/javavm02/yu/yu_html/index.html>

Right now, KS virtual method invocations look something like this:

  object->vtable->method_name(object)

Here's the actual pound-define for KinoSearch::Index::Term's destroy() method, which overrides a method inherited fromKinoSearch::Util::Obj:


  #define Kino_Term_Destroy(self) \
      (self)->_->destroy((kino_Obj*)self)

self->_ is the vtable; "destroy" is a member.

Under the indirect dispatch system, the vtable becomes an array offunction pointers rather than a struct with function pointer members,and method invocation changes to something like this:


  object->vtable[offset](object)

Here's how an actual pound-define might look:

  #define Kino_Term_Destroy(self) \

((kino_Obj_destroy_t)((self)->_[kino_Term_destroy_OFFSET])((kino_Obj*)self))

What this allows us to do is define the vtable layout and the offsetsdynamically during a bootstrap operation. The payoff is that amethod macro so defined retains binary compatibility even as thecomposition of the vtable changes with subsequent releases.

Stated another way: if we make the layout of the current vtables partof the public API, externally compiled code will assume that a methodlike "destroy" is located at a fixed location in the vtable -- and ifthe layout of the vtable changes, the externally compiled code willjump into the wrong method. (BAD!) However, if we make that offset avariable and set it at runtime, the externally compiled code willalways find the correct method to jump into.

Therefore, someone could write another XS library extending KS, andupgrading KS itself wouldn't cause breakage.

There's a cost in CPU cycles for this flexibility: one extra arraylook-up operation. However, GCJ uses this design, and theperformance penalty is apparently only around 2% on average:


  <http://www.usenix.org/events/javavm02/yu/yu_html/node29.html>

That might seem mild, but it actually makes sense to me, at least.On a modern, pipelining processor chip, that extra op just isn't abig deal. When I changed InStream and OutStream into "final"classes, so that heavily used methods like OutStream_Write_VInt()resolved directly to function addresses and no longer needed to beresolved via vtable double dereference, the benchmark barely budged:


  <http://xrl.us/7rty> (Link to mail-archives.apache.org)

A note about type safety:

The array of function pointers will have to be implemented as anarray of void*, since we won't know which functions go where in thevtable until runtime. This would seem to be a drawback, since intheory we lose a certain amount of compile-time checking. However,we aren't really losing much, if anything. The current systemdoesn't perform real type checking; the first argument is always cast(in this example, to kino_Obj*):


  #define Kino_Term_Destroy(self) \
      (self)->_->destroy((kino_Obj*)self)

However, at present, there *will* be a compile time error if thevtable doesn't contain a method with the appropriate name:


   /* compile-time error */
   kino_Obj_destroy_t destroy_meth = self->_->destro;

We will continue to enjoy a similar level of safety because the nameof the offset variable will have to be resolved by the dynamicloader. Say we remove the Kino_Term_Destroy method... then this codewill crash at run-time, because the kino_Term_destroy_OFFSET symbolcannot be resolved:


   destroy_meth = self->_[kino_Term_destroy_OFFSET];

Of course a run-time crash would be bad -- but that just means thatwe can't redact public methods -- which we wouldn't be doing anyway.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

[1] The technique I've out differs slightly from what's described inthe paper. For us the offsets can be stored in individual variables,but Yu et al put them in an "otable" array which is initialized bythe Java class loader.



_______________________________________________
KinoSearch mailing list
[EMAIL PROTECTED]
http://www.rectangular.com/mailman/listinfo/kinosearch

Towards a stable C API... via indirect dispatch

Reply via email to