Greets,

The VTable-based method dispatch system devised for Lucy is much, much faster than the hash-based dispatch systems typical of dynamic languages. However, since it is a parallel system, overriding an internal Lucy method using the host language's native OO mechanism presents problems.

  * How do we design a generic callback mechanism for invoking
    object methods which is portable across multiple host environments?
  * How does Lucy know that the user wants to override a method?
  * Will the native method prove unacceptably slow for inner loop code?

We can't do anything about the last problem, but it only affects performance-critical inner loops, and even then a system that doesn't scale well may still be useful for smaller collections or rapid prototyping. As for the first two problems, I believe I have at least partial solutions devised.

If we implement Lucy-level abstract methods as functions which call back to the host language using instance-method semantics, then Lucy will "see" the correct method even if it has been overridden multiple times at the host-language level.

  /* Create the appropriate wrapper around [self], call "get_doc_num"
   * on the wrapper, convert the callback's return value to an integer,
   * and return that integer to the C-level invocant.
   */
  u32_t
  Scorer_get_doc_num(Scorer *self)
  {
      return Native_callback_i(self, "get_doc_num", 0);
  }

I'm using this technique all over KinoSearch and it has proven quite successful. For instance, Scorer is spec'd out at the C level, but I've been able to build a pure-Perl MockScorer subclass, and a user has even released a pure-Perl WildCardQuery implementation to CPAN.

Abstract callbacks require a couple of tricks, and they aren't perfect.

First... the callback technique works fine when you are invoking the method from inside the Lucy C core, but... What if you want to invoke a method via the host language that *should* have been overridden, but might not have been? You can end up in an infinite loop with the callback invoking the binding invoking the callback and so on.

The solution is to insert an ABSTRACT_METHOD_CHECK in the binding code before the vtable-method invocation. The test assesses whether the function pointer in the vtable matches the address of the original implementing function.

  * If it matches, we'll get an infinite loop, so throw an error.
  * If it doesn't match, then the method has been overridden at
    the C level and it's safe to invoke.
  * If the method was overridden at the host-language level... well,
    this scenario never comes into play, because the original
    binding calling into C has been overridden.

Second... there are a lot of details about how you implement various Native_callback_xxxxx functions to handle different kinds of arguments... but we'll save that for another post.

Third... Say that you want to subclass not Scorer, but *TermScorer*, and you try to override TermScorer_Get_Doc_Num() via the host-language OO mechanism. The problem is that the function pointer in TermScorer's VTable for Get_Doc_Num doesn't call back to the host language -- so it never finds out that you've tried to override it. You'll get the native override when invoking from the host language, but not when invoking from within the library via the VTable.

Unfortunately, I haven't thought of a solution to this one. :( The best I can think of is some sort of override technique which stuffs a callback function into the subclass's VTable.

  package MyTermScorer;
  use base qw( Lucy::Search::TermScorer );
  __PACKAGE__->override(qw( get_doc_num ));
  ...

To me, that sounds both fiddly and like an implementation detail leaking out.

Nevertheless, the technique of abstract methods calling back to the host is so useful that I think we should just live with the drawbacks if the can't be resolved.

To implement these abstract callbacks, we need to be able to write a header file defining a generic interface which is compatible with every target language: probably this header would live at trunk/c_src/ Lucy/Util/Native.h.

Then we need to implement the Native.h interface with different C code for each target. We could potentially break things up with giant #ifdef LUCY_RUBY and such within trunk/c_src/Lucy/Util/Native.c, but I think that file would grow out of control, as would others like it. Instead, I think we should establish a second tree for C code within each binding folder. For the Perl binding, the file would probably live at trunk/perl/xs/Lucy/Util/Native.c.

If we can pull this off, it allows to move more code into Lucy's shared C core, reducing redundancy -- while simultaneously giving users maximum flexibility to innovate in their language of choice.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

Reply via email to