Lucifers,

We have discussed Clownfish interfaces already, but here's a quick recap. In order to support callbacks from Clownfish into the host language, we currently rely on class-based inheritance. In the host language, a Clownfish class is subclassed, adding method implementations written in the host language. The Clownfish compiler creates wrappers that call into the host language for each such method. This works for Perl, and should work for other dynamic languages like Python or Ruby as well.

This approach obviously doesn't work for host languages that don't support class-based inheritance, like Go or Rust. One possible solution is to add support for OOP interfaces which are the typically used by these languages to provide dynamic dispatch. Interfaces are also useful on the Clownfish side alone.

I researched how several languages implement interfaces to come up with a performant solution suitable for the Clownfish object system. Basic operations on interfaces include:

- Static casts from objects to interface types. (The cast is known
  to succeed at compile time.)
- Static casts from interface objects to the root object type.
- Dynamic casts from objects to interface types. (The object may
  not implement the target interface and the cast may fail.)
- Invocation of interface methods.

For some of my earlier thoughts, see CLOWNFISH-12:

    https://issues.apache.org/jira/browse/CLOWNFISH-12

Embedding multiple vtables (C++)
--------------------------------

C++ doesn't know interfaces as a distinct language concept, but they can be easily emulated with multiple inheritance. An interface is simply a class without member variables. How multiple inheritance is implemented in C depends on the compiler, but the typical approach is described in Stroustrup's paper "Multiple Inheritance for C++" (1999):

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735

The basic idea is to embed multiple vtables into each object and to implement up- and downcasting by adjusting the pointer to an object by fixed offsets. (The complications resulting from the diamond problem don't apply to interfaces since they don't have member variables.)

We could use the same idea for Clownfish. Static casts only require to apply an offset to the object pointer. Interface methods are invoked just like normal methods. Dynamic casts are more costly. They're rather slow in some C++ implementations where the whole inheritance tree is searched. Another problem is how to initialize the extra vtable pointers. Currently, we have a single method Class_Make_Obj to initialize objects. With multiple vtable pointers, we would need separate code paths for object initialization, or initialize interface vtables lazily.

Fat pointers (Go, Rust)
-----------------------

The approach chosen by Go and Rust is to implement interface objects as "fat pointers", that is a pair consisting of the pointer to the actual object and a pointer to the interface vtable (itable) for dynamic dispatch. When casting an object to interface type dynamically, Go requires a somewhat expensive hash table lookup to find the itable. Rust doesn't support dynamic casts at all. Static casts and method invocations are simple and fast.

The major downside is that fat pointers require double the storage space. This is especially inconvenient for containers like arrays of interface objects.

Direct lookup (Java)
--------------------

Like with C++, the implementation of interfaces is up to the JVM, but I found the paper "Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless" (2001) to be enlightening:

    http://dl.acm.org/citation.cfm?id=504291
    http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf

Under this approach, there's no special representation for interface objects. They're simply object pointers and casts are a no-op. When an interface method is invoked, the method is looked up dynamically by name in a hash table. The key to make this operation fast is to employ a hash table of function pointers, pointing to small generated code stubs that invoke the actual method, or, in the case of hash collisions, execute a short if-else-sequence.

I have no idea whether this is still the approach of choice in modern JVMs. I'd be curious if anyone has additional pointers.

My proposal
-----------

I really like the simplicity of the direct lookup approach from the IBM paper that makes no distinction between normal objects and interface objects and avoids the overhead of fat pointers. But stubs that directly invoke interface methods are impossible to implement in C without relying on implementation-defined behavior or even assembler. It's also impossible to add a default method to an interface, and invoke it on an object of a class from an existing binary without recompilation. One of the important reasons for default methods is to allow this kind of interface evolution.

But hash tables of function pointers with hard-coded collison resolution can still be used to lookup itables efficiently. Here's how it works:

- In every Class struct there's a small hash table with maybe 4-8
  slots. When invoking an interface method, an index into this table
  is computed from the full name of the interface like
  "Lucy::Analysis::Analyzer" and the table size. The important
  observation is that this index is known at compile time. This means
  that the hash function and table size must always be the same for a
  certain Clownfish version, of course.

- The function pointer in the hash table is called. It will point to
  generated code that looks like

    lucy_Analyzer_ITABLE*
    MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
                                        cfish_Interface *iface) {
        if (iface == LUCY_ANALYZER) {
            return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
        }
        else if (...) {
            // Possibly other interfaces implemented by MyAnalyzer
            // that happen to hash to the same slot. But typically,
            // there will be only a single interface.
        }
        else {
            // Other ways to handle errors are possible.
            CFISH_THROW(CFISH_ERR,
                        "Class %o doesn't implement interface %o",
                        CFISH_Obj_Get_Class_Name(obj);
                        CFISH_Interface_Get_Name(iface));
        }
    }

- MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
  Clownfish bootstrap, handling default methods and allowing
  interface evolution.

- The interface method is looked up in the returned itable, using
  a global offset variable, similar to normal Clownfish method
  dispatch.

Invoking interface methods this way should result in about twice the overhead of normal method invocation. It's also possible to combine this with the fat pointer approach, avoiding the itable lookup for subsequent interface method calls.

Host language callbacks
-----------------------

I can see two approaches. The first one autogenerates a "host" class for every interface. These classes contain a handle pointing to the host language object (SV* for Perl, registry index for Go) and have autogenerated method implementations that always call into the host language. Converting a host language object to Clownfish requires an initial interface to be specified. It's not possible to dynamically cast the resulting Clownfish object to other interface types the host language object may implement.

In the second approach, there's only a single Clownfish class for host objects. A host object can also be directly converted to Obj without specifying an interface type. When casting the host object to an interface type dynamically, the itable lookup is modified to check whether the host object actually implements the interface, then returns the appropriate itable. The problem with this approach is that it doesn't map to languages like Rust (or C++ without RTTI) that don't support dynamic casts.

In both approaches, converting a host object to Clownfish always creates a new Clownfish object. This is more expensive than the current Perl implementation which caches the Clownfish object, but allows standard Perl objects containing a blessed hash. Converting back to the host language simply extracts the handle.

To make default methods work, there's generated code on the host language side of an interface that calls the Clownfish implementation directly. This means that calling a default method from Clownfish results in a Clownfish-Host-Clownfish roundtrip but this should be acceptable. This overhead could be avoided for host languages that allow introspection by creating a custom itable for every host-class/interface combination, similar to the way we currently register Perl classes in the Clownfish registry. But I don't think it's worth the effort.

Concluding remarks
------------------

If we decide to switch to interfaces for host language callbacks, I intend to completely remove the ability to subclass Clownfish classes from languages like Perl. This means to remove some really well-thought-out code, but if we want to expand our scope to languages without class-based inheritance, we shouldn't support features that only work for some host languages. It also means that the Perl callback mechanism will become slower, though it probably won't be noticable given how slow Perl method calls are. The fact that users can write normal Perl classes without the need for inside-out member variables weighs up for that.

I think I can implement the approach described above in a few months for a 0.7 release, including Perl and Go support. Unless there are major objections to my plan, I'll just start on a separate branch. So if there are unforeseen obstacles, we don't have to revert commits on the master branch.

We will have to redesign all Lucy classes that are meant to be overridable from the host language. For that, I'll need feedback from the rest of the community, especially Marvin.

I don't plan to support subinterfaces (interfaces implementing other interface) in the first iteration, but this should be straightforward to add. I also may omit features from the initial implementation that aren't necessary for Lucy callbacks.

Nick

Reply via email to