Nick, Do not mean to warnock you re: this great email. You're obviously thinking through these issues for a variety of emerging languages, and that's vital to Lucy and Clownfish. I've no technical remarks, just encouragement as you dig in.
On Fri, Dec 16, 2016 at 12:02 PM, Nick Wellnhofer <wellnho...@aevum.de> wrote: > Lucifers, > > We have discussed Clownfish interfaces already, but here's a quick recap. > In order to support callbacks from Clownfish into the host language, we > currently rely on class-based inheritance. In the host language, a > Clownfish class is subclassed, adding method implementations written in the > host language. The Clownfish compiler creates wrappers that call into the > host language for each such method. This works for Perl, and should work > for other dynamic languages like Python or Ruby as well. > > This approach obviously doesn't work for host languages that don't support > class-based inheritance, like Go or Rust. One possible solution is to add > support for OOP interfaces which are the typically used by these languages > to provide dynamic dispatch. Interfaces are also useful on the Clownfish > side alone. > > I researched how several languages implement interfaces to come up with a > performant solution suitable for the Clownfish object system. Basic > operations on interfaces include: > > - Static casts from objects to interface types. (The cast is known > to succeed at compile time.) > - Static casts from interface objects to the root object type. > - Dynamic casts from objects to interface types. (The object may > not implement the target interface and the cast may fail.) > - Invocation of interface methods. > > For some of my earlier thoughts, see CLOWNFISH-12: > > https://issues.apache.org/jira/browse/CLOWNFISH-12 > > Embedding multiple vtables (C++) > -------------------------------- > > C++ doesn't know interfaces as a distinct language concept, but they can > be easily emulated with multiple inheritance. An interface is simply a > class without member variables. How multiple inheritance is implemented in > C depends on the compiler, but the typical approach is described in > Stroustrup's paper "Multiple Inheritance for C++" (1999): > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735 > > The basic idea is to embed multiple vtables into each object and to > implement up- and downcasting by adjusting the pointer to an object by > fixed offsets. (The complications resulting from the diamond problem don't > apply to interfaces since they don't have member variables.) > > We could use the same idea for Clownfish. Static casts only require to > apply an offset to the object pointer. Interface methods are invoked just > like normal methods. Dynamic casts are more costly. They're rather slow in > some C++ implementations where the whole inheritance tree is searched. > Another problem is how to initialize the extra vtable pointers. Currently, > we have a single method Class_Make_Obj to initialize objects. With multiple > vtable pointers, we would need separate code paths for object > initialization, or initialize interface vtables lazily. > > Fat pointers (Go, Rust) > ----------------------- > > The approach chosen by Go and Rust is to implement interface objects as > "fat pointers", that is a pair consisting of the pointer to the actual > object and a pointer to the interface vtable (itable) for dynamic dispatch. > When casting an object to interface type dynamically, Go requires a > somewhat expensive hash table lookup to find the itable. Rust doesn't > support dynamic casts at all. Static casts and method invocations are > simple and fast. > > The major downside is that fat pointers require double the storage space. > This is especially inconvenient for containers like arrays of interface > objects. > > Direct lookup (Java) > -------------------- > > Like with C++, the implementation of interfaces is up to the JVM, but I > found the paper "Efficient Implementation of Java Interfaces: > Invokeinterface Considered Harmless" (2001) to be enlightening: > > http://dl.acm.org/citation.cfm?id=504291 > http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf > > Under this approach, there's no special representation for interface > objects. They're simply object pointers and casts are a no-op. When an > interface method is invoked, the method is looked up dynamically by name in > a hash table. The key to make this operation fast is to employ a hash table > of function pointers, pointing to small generated code stubs that invoke > the actual method, or, in the case of hash collisions, execute a short > if-else-sequence. > > I have no idea whether this is still the approach of choice in modern > JVMs. I'd be curious if anyone has additional pointers. > > My proposal > ----------- > > I really like the simplicity of the direct lookup approach from the IBM > paper that makes no distinction between normal objects and interface > objects and avoids the overhead of fat pointers. But stubs that directly > invoke interface methods are impossible to implement in C without relying > on implementation-defined behavior or even assembler. It's also impossible > to add a default method to an interface, and invoke it on an object of a > class from an existing binary without recompilation. One of the important > reasons for default methods is to allow this kind of interface evolution. > > But hash tables of function pointers with hard-coded collison resolution > can still be used to lookup itables efficiently. Here's how it works: > > - In every Class struct there's a small hash table with maybe 4-8 > slots. When invoking an interface method, an index into this table > is computed from the full name of the interface like > "Lucy::Analysis::Analyzer" and the table size. The important > observation is that this index is known at compile time. This means > that the hash function and table size must always be the same for a > certain Clownfish version, of course. > > - The function pointer in the hash table is called. It will point to > generated code that looks like > > lucy_Analyzer_ITABLE* > MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj, > cfish_Interface *iface) { > if (iface == LUCY_ANALYZER) { > return MYPARCEL_MyAnalyzer_Analyzer_ITABLE; > } > else if (...) { > // Possibly other interfaces implemented by MyAnalyzer > // that happen to hash to the same slot. But typically, > // there will be only a single interface. > } > else { > // Other ways to handle errors are possible. > CFISH_THROW(CFISH_ERR, > "Class %o doesn't implement interface %o", > CFISH_Obj_Get_Class_Name(obj); > CFISH_Interface_Get_Name(iface)); > } > } > > - MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during > Clownfish bootstrap, handling default methods and allowing > interface evolution. > > - The interface method is looked up in the returned itable, using > a global offset variable, similar to normal Clownfish method > dispatch. > > Invoking interface methods this way should result in about twice the > overhead of normal method invocation. It's also possible to combine this > with the fat pointer approach, avoiding the itable lookup for subsequent > interface method calls. > > Host language callbacks > ----------------------- > > I can see two approaches. The first one autogenerates a "host" class for > every interface. These classes contain a handle pointing to the host > language object (SV* for Perl, registry index for Go) and have > autogenerated method implementations that always call into the host > language. Converting a host language object to Clownfish requires an > initial interface to be specified. It's not possible to dynamically cast > the resulting Clownfish object to other interface types the host language > object may implement. > > In the second approach, there's only a single Clownfish class for host > objects. A host object can also be directly converted to Obj without > specifying an interface type. When casting the host object to an interface > type dynamically, the itable lookup is modified to check whether the host > object actually implements the interface, then returns the appropriate > itable. The problem with this approach is that it doesn't map to languages > like Rust (or C++ without RTTI) that don't support dynamic casts. > > In both approaches, converting a host object to Clownfish always creates a > new Clownfish object. This is more expensive than the current Perl > implementation which caches the Clownfish object, but allows standard Perl > objects containing a blessed hash. Converting back to the host language > simply extracts the handle. > > To make default methods work, there's generated code on the host language > side of an interface that calls the Clownfish implementation directly. This > means that calling a default method from Clownfish results in a > Clownfish-Host-Clownfish roundtrip but this should be acceptable. This > overhead could be avoided for host languages that allow introspection by > creating a custom itable for every host-class/interface combination, > similar to the way we currently register Perl classes in the Clownfish > registry. But I don't think it's worth the effort. > > Concluding remarks > ------------------ > > If we decide to switch to interfaces for host language callbacks, I intend > to completely remove the ability to subclass Clownfish classes from > languages like Perl. This means to remove some really well-thought-out > code, but if we want to expand our scope to languages without class-based > inheritance, we shouldn't support features that only work for some host > languages. It also means that the Perl callback mechanism will become > slower, though it probably won't be noticable given how slow Perl method > calls are. The fact that users can write normal Perl classes without the > need for inside-out member variables weighs up for that. > > I think I can implement the approach described above in a few months for a > 0.7 release, including Perl and Go support. Unless there are major > objections to my plan, I'll just start on a separate branch. So if there > are unforeseen obstacles, we don't have to revert commits on the master > branch. > > We will have to redesign all Lucy classes that are meant to be overridable > from the host language. For that, I'll need feedback from the rest of the > community, especially Marvin. > > I don't plan to support subinterfaces (interfaces implementing other > interface) in the first iteration, but this should be straightforward to > add. I also may omit features from the initial implementation that aren't > necessary for Lucy callbacks. > > Nick > -- Peter Karman . pe...@peknet.com . http://peknet.com/