[lucy-dev] Clownfish interfaces

Nick Wellnhofer Fri, 16 Dec 2016 10:04:33 -0800

Lucifers,

We have discussed Clownfish interfaces already, but here's a quick recap. Inorder to support callbacks from Clownfish into the host language, we currentlyrely on class-based inheritance. In the host language, a Clownfish class issubclassed, adding method implementations written in the host language. TheClownfish compiler creates wrappers that call into the host language for eachsuch method. This works for Perl, and should work for other dynamic languageslike Python or Ruby as well.

This approach obviously doesn't work for host languages that don't supportclass-based inheritance, like Go or Rust. One possible solution is to addsupport for OOP interfaces which are the typically used by these languages toprovide dynamic dispatch. Interfaces are also useful on the Clownfish side alone.

I researched how several languages implement interfaces to come up with aperformant solution suitable for the Clownfish object system. Basic operationson interfaces include:


- Static casts from objects to interface types. (The cast is known
  to succeed at compile time.)
- Static casts from interface objects to the root object type.
- Dynamic casts from objects to interface types. (The object may
  not implement the target interface and the cast may fail.)
- Invocation of interface methods.

For some of my earlier thoughts, see CLOWNFISH-12:

    https://issues.apache.org/jira/browse/CLOWNFISH-12

Embedding multiple vtables (C++)
--------------------------------

C++ doesn't know interfaces as a distinct language concept, but they can beeasily emulated with multiple inheritance. An interface is simply a classwithout member variables. How multiple inheritance is implemented in C dependson the compiler, but the typical approach is described in Stroustrup's paper"Multiple Inheritance for C++" (1999):


    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735

The basic idea is to embed multiple vtables into each object and to implementup- and downcasting by adjusting the pointer to an object by fixed offsets.(The complications resulting from the diamond problem don't apply tointerfaces since they don't have member variables.)

We could use the same idea for Clownfish. Static casts only require to applyan offset to the object pointer. Interface methods are invoked just likenormal methods. Dynamic casts are more costly. They're rather slow in some C++implementations where the whole inheritance tree is searched. Another problemis how to initialize the extra vtable pointers. Currently, we have a singlemethod Class_Make_Obj to initialize objects. With multiple vtable pointers, wewould need separate code paths for object initialization, or initializeinterface vtables lazily.


Fat pointers (Go, Rust)
-----------------------

The approach chosen by Go and Rust is to implement interface objects as "fatpointers", that is a pair consisting of the pointer to the actual object and apointer to the interface vtable (itable) for dynamic dispatch. When casting anobject to interface type dynamically, Go requires a somewhat expensive hashtable lookup to find the itable. Rust doesn't support dynamic casts at all.Static casts and method invocations are simple and fast.

The major downside is that fat pointers require double the storage space. Thisis especially inconvenient for containers like arrays of interface objects.


Direct lookup (Java)
--------------------

Like with C++, the implementation of interfaces is up to the JVM, but I foundthe paper "Efficient Implementation of Java Interfaces: InvokeinterfaceConsidered Harmless" (2001) to be enlightening:


    http://dl.acm.org/citation.cfm?id=504291
    http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf

Under this approach, there's no special representation for interface objects.They're simply object pointers and casts are a no-op. When an interface methodis invoked, the method is looked up dynamically by name in a hash table. Thekey to make this operation fast is to employ a hash table of functionpointers, pointing to small generated code stubs that invoke the actualmethod, or, in the case of hash collisions, execute a short if-else-sequence.

I have no idea whether this is still the approach of choice in modern JVMs.I'd be curious if anyone has additional pointers.


My proposal
-----------

I really like the simplicity of the direct lookup approach from the IBM paperthat makes no distinction between normal objects and interface objects andavoids the overhead of fat pointers. But stubs that directly invoke interfacemethods are impossible to implement in C without relying onimplementation-defined behavior or even assembler. It's also impossible to adda default method to an interface, and invoke it on an object of a class froman existing binary without recompilation. One of the important reasons fordefault methods is to allow this kind of interface evolution.

But hash tables of function pointers with hard-coded collison resolution canstill be used to lookup itables efficiently. Here's how it works:


- In every Class struct there's a small hash table with maybe 4-8
  slots. When invoking an interface method, an index into this table
  is computed from the full name of the interface like
  "Lucy::Analysis::Analyzer" and the table size. The important
  observation is that this index is known at compile time. This means
  that the hash function and table size must always be the same for a
  certain Clownfish version, of course.

- The function pointer in the hash table is called. It will point to
  generated code that looks like

    lucy_Analyzer_ITABLE*
    MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
                                        cfish_Interface *iface) {
        if (iface == LUCY_ANALYZER) {
            return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
        }
        else if (...) {
            // Possibly other interfaces implemented by MyAnalyzer
            // that happen to hash to the same slot. But typically,
            // there will be only a single interface.
        }
        else {
            // Other ways to handle errors are possible.
            CFISH_THROW(CFISH_ERR,
                        "Class %o doesn't implement interface %o",
                        CFISH_Obj_Get_Class_Name(obj);
                        CFISH_Interface_Get_Name(iface));
        }
    }

- MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
  Clownfish bootstrap, handling default methods and allowing
  interface evolution.

- The interface method is looked up in the returned itable, using
  a global offset variable, similar to normal Clownfish method
  dispatch.

Invoking interface methods this way should result in about twice the overheadof normal method invocation. It's also possible to combine this with the fatpointer approach, avoiding the itable lookup for subsequent interface methodcalls.


Host language callbacks
-----------------------

I can see two approaches. The first one autogenerates a "host" class for everyinterface. These classes contain a handle pointing to the host language object(SV* for Perl, registry index for Go) and have autogenerated methodimplementations that always call into the host language. Converting a hostlanguage object to Clownfish requires an initial interface to be specified.It's not possible to dynamically cast the resulting Clownfish object to otherinterface types the host language object may implement.

In the second approach, there's only a single Clownfish class for hostobjects. A host object can also be directly converted to Obj withoutspecifying an interface type. When casting the host object to an interfacetype dynamically, the itable lookup is modified to check whether the hostobject actually implements the interface, then returns the appropriate itable.The problem with this approach is that it doesn't map to languages like Rust(or C++ without RTTI) that don't support dynamic casts.

In both approaches, converting a host object to Clownfish always creates a newClownfish object. This is more expensive than the current Perl implementationwhich caches the Clownfish object, but allows standard Perl objects containinga blessed hash. Converting back to the host language simply extracts the handle.

To make default methods work, there's generated code on the host language sideof an interface that calls the Clownfish implementation directly. This meansthat calling a default method from Clownfish results in aClownfish-Host-Clownfish roundtrip but this should be acceptable. Thisoverhead could be avoided for host languages that allow introspection bycreating a custom itable for every host-class/interface combination, similarto the way we currently register Perl classes in the Clownfish registry. But Idon't think it's worth the effort.


Concluding remarks
------------------

If we decide to switch to interfaces for host language callbacks, I intend tocompletely remove the ability to subclass Clownfish classes from languageslike Perl. This means to remove some really well-thought-out code, but if wewant to expand our scope to languages without class-based inheritance, weshouldn't support features that only work for some host languages. It alsomeans that the Perl callback mechanism will become slower, though it probablywon't be noticable given how slow Perl method calls are. The fact that userscan write normal Perl classes without the need for inside-out member variablesweighs up for that.

I think I can implement the approach described above in a few months for a 0.7release, including Perl and Go support. Unless there are major objections tomy plan, I'll just start on a separate branch. So if there are unforeseenobstacles, we don't have to revert commits on the master branch.

We will have to redesign all Lucy classes that are meant to be overridablefrom the host language. For that, I'll need feedback from the rest of thecommunity, especially Marvin.

I don't plan to support subinterfaces (interfaces implementing otherinterface) in the first iteration, but this should be straightforward to add.I also may omit features from the initial implementation that aren't necessaryfor Lucy callbacks.


Nick

[lucy-dev] Clownfish interfaces

Reply via email to