Greets,
The VTable-based method dispatch system devised for Lucy is much, much
faster than the hash-based dispatch systems typical of dynamic
languages. However, since it is a parallel system, overriding an
internal Lucy method using the host language's native OO mechanism
presents problems.
* How do we design a generic callback mechanism for invoking
object methods which is portable across multiple host environments?
* How does Lucy know that the user wants to override a method?
* Will the native method prove unacceptably slow for inner loop code?
We can't do anything about the last problem, but it only affects
performance-critical inner loops, and even then a system that doesn't
scale well may still be useful for smaller collections or rapid
prototyping. As for the first two problems, I believe I have at least
partial solutions devised.
If we implement Lucy-level abstract methods as functions which call
back to the host language using instance-method semantics, then Lucy
will "see" the correct method even if it has been overridden multiple
times at the host-language level.
/* Create the appropriate wrapper around [self], call "get_doc_num"
* on the wrapper, convert the callback's return value to an integer,
* and return that integer to the C-level invocant.
*/
u32_t
Scorer_get_doc_num(Scorer *self)
{
return Native_callback_i(self, "get_doc_num", 0);
}
I'm using this technique all over KinoSearch and it has proven quite
successful. For instance, Scorer is spec'd out at the C level, but
I've been able to build a pure-Perl MockScorer subclass, and a user
has even released a pure-Perl WildCardQuery implementation to CPAN.
Abstract callbacks require a couple of tricks, and they aren't perfect.
First... the callback technique works fine when you are invoking the
method from inside the Lucy C core, but... What if you want to invoke
a method via the host language that *should* have been overridden, but
might not have been? You can end up in an infinite loop with the
callback invoking the binding invoking the callback and so on.
The solution is to insert an ABSTRACT_METHOD_CHECK in the binding code
before the vtable-method invocation. The test assesses whether the
function pointer in the vtable matches the address of the original
implementing function.
* If it matches, we'll get an infinite loop, so throw an error.
* If it doesn't match, then the method has been overridden at
the C level and it's safe to invoke.
* If the method was overridden at the host-language level... well,
this scenario never comes into play, because the original
binding calling into C has been overridden.
Second... there are a lot of details about how you implement various
Native_callback_xxxxx functions to handle different kinds of
arguments... but we'll save that for another post.
Third... Say that you want to subclass not Scorer, but *TermScorer*,
and you try to override TermScorer_Get_Doc_Num() via the host-language
OO mechanism. The problem is that the function pointer in
TermScorer's VTable for Get_Doc_Num doesn't call back to the host
language -- so it never finds out that you've tried to override it.
You'll get the native override when invoking from the host language,
but not when invoking from within the library via the VTable.
Unfortunately, I haven't thought of a solution to this one. :( The
best I can think of is some sort of override technique which stuffs a
callback function into the subclass's VTable.
package MyTermScorer;
use base qw( Lucy::Search::TermScorer );
__PACKAGE__->override(qw( get_doc_num ));
...
To me, that sounds both fiddly and like an implementation detail
leaking out.
Nevertheless, the technique of abstract methods calling back to the
host is so useful that I think we should just live with the drawbacks
if the can't be resolved.
To implement these abstract callbacks, we need to be able to write a
header file defining a generic interface which is compatible with
every target language: probably this header would live at trunk/c_src/
Lucy/Util/Native.h.
Then we need to implement the Native.h interface with different C code
for each target. We could potentially break things up with giant
#ifdef LUCY_RUBY and such within trunk/c_src/Lucy/Util/Native.c, but I
think that file would grow out of control, as would others like it.
Instead, I think we should establish a second tree for C code within
each binding folder. For the Perl binding, the file would probably
live at trunk/perl/xs/Lucy/Util/Native.c.
If we can pull this off, it allows to move more code into Lucy's
shared C core, reducing redundancy -- while simultaneously giving
users maximum flexibility to innovate in their language of choice.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/