Native callbacks

Marvin Humphrey Mon, 21 Apr 2008 22:02:07 -0700

Greets,

The VTable-based method dispatch system devised for Lucy is much, muchfaster than the hash-based dispatch systems typical of dynamiclanguages. However, since it is a parallel system, overriding aninternal Lucy method using the host language's native OO mechanismpresents problems.


  * How do we design a generic callback mechanism for invoking
    object methods which is portable across multiple host environments?
  * How does Lucy know that the user wants to override a method?
  * Will the native method prove unacceptably slow for inner loop code?

We can't do anything about the last problem, but it only affectsperformance-critical inner loops, and even then a system that doesn'tscale well may still be useful for smaller collections or rapidprototyping. As for the first two problems, I believe I have at leastpartial solutions devised.

If we implement Lucy-level abstract methods as functions which callback to the host language using instance-method semantics, then Lucywill "see" the correct method even if it has been overridden multipletimes at the host-language level.


  /* Create the appropriate wrapper around [self], call "get_doc_num"
   * on the wrapper, convert the callback's return value to an integer,
   * and return that integer to the C-level invocant.
   */
  u32_t
  Scorer_get_doc_num(Scorer *self)
  {
      return Native_callback_i(self, "get_doc_num", 0);
  }

I'm using this technique all over KinoSearch and it has proven quitesuccessful. For instance, Scorer is spec'd out at the C level, butI've been able to build a pure-Perl MockScorer subclass, and a userhas even released a pure-Perl WildCardQuery implementation to CPAN.


Abstract callbacks require a couple of tricks, and they aren't perfect.

First... the callback technique works fine when you are invoking themethod from inside the Lucy C core, but... What if you want to invokea method via the host language that *should* have been overridden, butmight not have been? You can end up in an infinite loop with thecallback invoking the binding invoking the callback and so on.

The solution is to insert an ABSTRACT_METHOD_CHECK in the binding codebefore the vtable-method invocation. The test assesses whether thefunction pointer in the vtable matches the address of the originalimplementing function.


  * If it matches, we'll get an infinite loop, so throw an error.
  * If it doesn't match, then the method has been overridden at
    the C level and it's safe to invoke.
  * If the method was overridden at the host-language level... well,
    this scenario never comes into play, because the original
    binding calling into C has been overridden.

Second... there are a lot of details about how you implement variousNative_callback_xxxxx functions to handle different kinds ofarguments... but we'll save that for another post.

Third... Say that you want to subclass not Scorer, but *TermScorer*,and you try to override TermScorer_Get_Doc_Num() via the host-languageOO mechanism. The problem is that the function pointer inTermScorer's VTable for Get_Doc_Num doesn't call back to the hostlanguage -- so it never finds out that you've tried to override it.You'll get the native override when invoking from the host language,but not when invoking from within the library via the VTable.

Unfortunately, I haven't thought of a solution to this one. :( Thebest I can think of is some sort of override technique which stuffs acallback function into the subclass's VTable.


  package MyTermScorer;
  use base qw( Lucy::Search::TermScorer );
  __PACKAGE__->override(qw( get_doc_num ));
  ...

To me, that sounds both fiddly and like an implementation detailleaking out.

Nevertheless, the technique of abstract methods calling back to thehost is so useful that I think we should just live with the drawbacksif the can't be resolved.

To implement these abstract callbacks, we need to be able to write aheader file defining a generic interface which is compatible withevery target language: probably this header would live at trunk/c_src/Lucy/Util/Native.h.

Then we need to implement the Native.h interface with different C codefor each target. We could potentially break things up with giant#ifdef LUCY_RUBY and such within trunk/c_src/Lucy/Util/Native.c, but Ithink that file would grow out of control, as would others like it.Instead, I think we should establish a second tree for C code withineach binding folder. For the Perl binding, the file would probablylive at trunk/perl/xs/Lucy/Util/Native.c.

If we can pull this off, it allows to move more code into Lucy'sshared C core, reducing redundancy -- while simultaneously givingusers maximum flexibility to innovate in their language of choice.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Native callbacks

Reply via email to