Nick,

Do not mean to warnock you re: this great email. You're obviously thinking
through these issues for a variety of emerging languages, and that's vital
to Lucy and Clownfish. I've no technical remarks, just encouragement as you
dig in.

On Fri, Dec 16, 2016 at 12:02 PM, Nick Wellnhofer <wellnho...@aevum.de>
wrote:

> Lucifers,
>
> We have discussed Clownfish interfaces already, but here's a quick recap.
> In order to support callbacks from Clownfish into the host language, we
> currently rely on class-based inheritance. In the host language, a
> Clownfish class is subclassed, adding method implementations written in the
> host language. The Clownfish compiler creates wrappers that call into the
> host language for each such method. This works for Perl, and should work
> for other dynamic languages like Python or Ruby as well.
>
> This approach obviously doesn't work for host languages that don't support
> class-based inheritance, like Go or Rust. One possible solution is to add
> support for OOP interfaces which are the typically used by these languages
> to provide dynamic dispatch. Interfaces are also useful on the Clownfish
> side alone.
>
> I researched how several languages implement interfaces to come up with a
> performant solution suitable for the Clownfish object system. Basic
> operations on interfaces include:
>
> - Static casts from objects to interface types. (The cast is known
>   to succeed at compile time.)
> - Static casts from interface objects to the root object type.
> - Dynamic casts from objects to interface types. (The object may
>   not implement the target interface and the cast may fail.)
> - Invocation of interface methods.
>
> For some of my earlier thoughts, see CLOWNFISH-12:
>
>     https://issues.apache.org/jira/browse/CLOWNFISH-12
>
> Embedding multiple vtables (C++)
> --------------------------------
>
> C++ doesn't know interfaces as a distinct language concept, but they can
> be easily emulated with multiple inheritance. An interface is simply a
> class without member variables. How multiple inheritance is implemented in
> C depends on the compiler, but the typical approach is described in
> Stroustrup's paper "Multiple Inheritance for C++" (1999):
>
>     http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.4735
>
> The basic idea is to embed multiple vtables into each object and to
> implement up- and downcasting by adjusting the pointer to an object by
> fixed offsets. (The complications resulting from the diamond problem don't
> apply to interfaces since they don't have member variables.)
>
> We could use the same idea for Clownfish. Static casts only require to
> apply an offset to the object pointer. Interface methods are invoked just
> like normal methods. Dynamic casts are more costly. They're rather slow in
> some C++ implementations where the whole inheritance tree is searched.
> Another problem is how to initialize the extra vtable pointers. Currently,
> we have a single method Class_Make_Obj to initialize objects. With multiple
> vtable pointers, we would need separate code paths for object
> initialization, or initialize interface vtables lazily.
>
> Fat pointers (Go, Rust)
> -----------------------
>
> The approach chosen by Go and Rust is to implement interface objects as
> "fat pointers", that is a pair consisting of the pointer to the actual
> object and a pointer to the interface vtable (itable) for dynamic dispatch.
> When casting an object to interface type dynamically, Go requires a
> somewhat expensive hash table lookup to find the itable. Rust doesn't
> support dynamic casts at all. Static casts and method invocations are
> simple and fast.
>
> The major downside is that fat pointers require double the storage space.
> This is especially inconvenient for containers like arrays of interface
> objects.
>
> Direct lookup (Java)
> --------------------
>
> Like with C++, the implementation of interfaces is up to the JVM, but I
> found the paper "Efficient Implementation of Java Interfaces:
> Invokeinterface Considered Harmless" (2001) to be enlightening:
>
>     http://dl.acm.org/citation.cfm?id=504291
>     http://www.research.ibm.com/people/d/dgrove/papers/oopsla01.pdf
>
> Under this approach, there's no special representation for interface
> objects. They're simply object pointers and casts are a no-op. When an
> interface method is invoked, the method is looked up dynamically by name in
> a hash table. The key to make this operation fast is to employ a hash table
> of function pointers, pointing to small generated code stubs that invoke
> the actual method, or, in the case of hash collisions, execute a short
> if-else-sequence.
>
> I have no idea whether this is still the approach of choice in modern
> JVMs. I'd be curious if anyone has additional pointers.
>
> My proposal
> -----------
>
> I really like the simplicity of the direct lookup approach from the IBM
> paper that makes no distinction between normal objects and interface
> objects and avoids the overhead of fat pointers. But stubs that directly
> invoke interface methods are impossible to implement in C without relying
> on implementation-defined behavior or even assembler. It's also impossible
> to add a default method to an interface, and invoke it on an object of a
> class from an existing binary without recompilation. One of the important
> reasons for default methods is to allow this kind of interface evolution.
>
> But hash tables of function pointers with hard-coded collison resolution
> can still be used to lookup itables efficiently. Here's how it works:
>
> - In every Class struct there's a small hash table with maybe 4-8
>   slots. When invoking an interface method, an index into this table
>   is computed from the full name of the interface like
>   "Lucy::Analysis::Analyzer" and the table size. The important
>   observation is that this index is known at compile time. This means
>   that the hash function and table size must always be the same for a
>   certain Clownfish version, of course.
>
> - The function pointer in the hash table is called. It will point to
>   generated code that looks like
>
>     lucy_Analyzer_ITABLE*
>     MYPARCEL_MyAnalyzer_ITABLE_GETTER_3(cfish_Obj *obj,
>                                         cfish_Interface *iface) {
>         if (iface == LUCY_ANALYZER) {
>             return MYPARCEL_MyAnalyzer_Analyzer_ITABLE;
>         }
>         else if (...) {
>             // Possibly other interfaces implemented by MyAnalyzer
>             // that happen to hash to the same slot. But typically,
>             // there will be only a single interface.
>         }
>         else {
>             // Other ways to handle errors are possible.
>             CFISH_THROW(CFISH_ERR,
>                         "Class %o doesn't implement interface %o",
>                         CFISH_Obj_Get_Class_Name(obj);
>                         CFISH_Interface_Get_Name(iface));
>         }
>     }
>
> - MYPARCEL_MyAnalyzer_Analyzer_ITABLE is populated dynamically during
>   Clownfish bootstrap, handling default methods and allowing
>   interface evolution.
>
> - The interface method is looked up in the returned itable, using
>   a global offset variable, similar to normal Clownfish method
>   dispatch.
>
> Invoking interface methods this way should result in about twice the
> overhead of normal method invocation. It's also possible to combine this
> with the fat pointer approach, avoiding the itable lookup for subsequent
> interface method calls.
>
> Host language callbacks
> -----------------------
>
> I can see two approaches. The first one autogenerates a "host" class for
> every interface. These classes contain a handle pointing to the host
> language object (SV* for Perl, registry index for Go) and have
> autogenerated method implementations that always call into the host
> language. Converting a host language object to Clownfish requires an
> initial interface to be specified. It's not possible to dynamically cast
> the resulting Clownfish object to other interface types the host language
> object may implement.
>
> In the second approach, there's only a single Clownfish class for host
> objects. A host object can also be directly converted to Obj without
> specifying an interface type. When casting the host object to an interface
> type dynamically, the itable lookup is modified to check whether the host
> object actually implements the interface, then returns the appropriate
> itable. The problem with this approach is that it doesn't map to languages
> like Rust (or C++ without RTTI) that don't support dynamic casts.
>
> In both approaches, converting a host object to Clownfish always creates a
> new Clownfish object. This is more expensive than the current Perl
> implementation which caches the Clownfish object, but allows standard Perl
> objects containing a blessed hash. Converting back to the host language
> simply extracts the handle.
>
> To make default methods work, there's generated code on the host language
> side of an interface that calls the Clownfish implementation directly. This
> means that calling a default method from Clownfish results in a
> Clownfish-Host-Clownfish roundtrip but this should be acceptable. This
> overhead could be avoided for host languages that allow introspection by
> creating a custom itable for every host-class/interface combination,
> similar to the way we currently register Perl classes in the Clownfish
> registry. But I don't think it's worth the effort.
>
> Concluding remarks
> ------------------
>
> If we decide to switch to interfaces for host language callbacks, I intend
> to completely remove the ability to subclass Clownfish classes from
> languages like Perl. This means to remove some really well-thought-out
> code, but if we want to expand our scope to languages without class-based
> inheritance, we shouldn't support features that only work for some host
> languages. It also means that the Perl callback mechanism will become
> slower, though it probably won't be noticable given how slow Perl method
> calls are. The fact that users can write normal Perl classes without the
> need for inside-out member variables weighs up for that.
>
> I think I can implement the approach described above in a few months for a
> 0.7 release, including Perl and Go support. Unless there are major
> objections to my plan, I'll just start on a separate branch. So if there
> are unforeseen obstacles, we don't have to revert commits on the master
> branch.
>
> We will have to redesign all Lucy classes that are meant to be overridable
> from the host language. For that, I'll need feedback from the rest of the
> community, especially Marvin.
>
> I don't plan to support subinterfaces (interfaces implementing other
> interface) in the first iteration, but this should be straightforward to
> add. I also may omit features from the initial implementation that aren't
> necessary for Lucy callbacks.
>
> Nick
>



-- 
Peter Karman . pe...@peknet.com . http://peknet.com/

Reply via email to