[jira] Commented: (LUCY-5) Boilerplater compiler

Marvin Humphrey (JIRA) Mon, 16 Mar 2009 08:42:28 -0700

    [ 
https://issues.apache.org/jira/browse/LUCY-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682334#action_12682334
 ]


Marvin Humphrey commented on LUCY-5:
------------------------------------

> are there many internal -> internal function calls (ie, "normal" C function
> calls)?

InStream and OutStream are "final" classes in KS, so all of their "method
calls" are actually straight up "normal" C function calls.  I specifically
chose to make those two "final" because they get hit more often than anybody
else.

However, there seems to be little performance difference one way or the other.
The old KS indexing benchmarker shows maybe a 1% difference at most when I
flip the "final" flags on InStream and OutStream, close to the noise floor of
the test on my Mac.

I think this is partly because minimizing OO overhead has been a KS design
goal for a very long time; whenever possible, we avoid creating objects or
making method calls.  For instance, since InStream is final, InStream_Read_C32
performs all of its operations within a single function call (provided the
buffer doesn't need refilling), instead of needing to invoke
InStream_Read_Byte 1-5 times.  (C32 = Compressed 32-bit integer, analogous to
a Lucene VInt).  Because there are fewer method invocations overall, messing
with the method-invocation apparatus has less of an effect than it might have
on other libraries.

Another thing to bear in mind is that the "indirect dispatch" technique used
by inside-out vtables just isn't all that expensive.  Take a look at the GCJ
performance evaluations at
[http://www.usenix.org/events/javavm02/yu/yu_html/node29.html] -- they reveal
a 1-2% maximum difference on some tests which are far more method-call
intensive that what Lucy would be doing.

> most APIs are in theory overridable in the host language?

Yes -- anything which hasn't been declared "final", which means most APIs.

There are a handful of non-public methods and classes which could in theory be
declared final, but I haven't bothered because it probably wouldn't make much
difference.

The perl test files in KS take advantage of the subclassing API all the time.
All the combining Scorers (ORScorer, ANDScorer, etc) use pure-Perl MockScorer
instances as their subscorers.  Come to think of it, the current implementation 
of 
Schema *requires* you to subclass it, though that's about to change.

Of course calling back to a dynamic language host causes a big performance
degradation on tight loops, but it's still good enough for small data sets and
rapid prototyping.

> Boilerplater compiler
> ---------------------
>
>                 Key: LUCY-5
>                 URL: https://issues.apache.org/jira/browse/LUCY-5
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Boilerplater
>            Reporter: Marvin Humphrey
>            Assignee: Marvin Humphrey
>
> Boilerplater is a small compiler which supports a vtable-based object model.
> The output is C code which adheres to the design that Dave Balmain and I
> hammered out a while back; the input is a collection of ".bp" header files.
> Our original intent was to pepper traditional C ".h" header files with no-op
> macros to define each class's interface; the code generator would understand
> these macros but the C compiler would ignore them.  C source code files would
> then pound-include both the ".h" header and the auxiliary, generated ".bp"
> file.
> The problem with this approach is that C syntax is too constraining.  Because
> C does not support namespacing, every symbol has to be prepended with a prefix
> to avoid conflicts.  Futhermore, adding metadata to declarations (such as
> default values for arguments, or whether NULL is an acceptable value) is
> awkward.  The result is ".h" header files that are excessively verbose,
> cumbersome to edit, and challenging to parse visually and to grok.
> The solution is to make the ".bp" file the master header file, and write it in
> a small, purpose-built, declaration-only language.  The
> code-generator/compiler chews this ".bp" file and spits out a single ".h"
> header file for pound-inclusion in ".c" source code files.
> This isn't really that great a divergence from the original plan.  There's no
> fixed point at which a "code generator" becomes a "compiler", and while the
> declaration-only header language has a few conventions that core developers
> will have to familiarize themselves with, the same was true for the no-op
> macro scheme.  Furthermore, the Boilerplater compiler itself is merely an
> implementation detail; it is not publicly exposed and thus can be modified at
> will.  Users who access Lucy via Perl, Ruby, Java, etc will never see it.
> Even Lucy's C users will never see it, because the public C API itself will be
> defined by a lightweight binding and generated documentation.
> The important thing for us to focus on is the *output* code generated by
> Boilerplater.  We must nail the object model.  It has to be fast.  It has to
> live happily as a symbiote within each host.  It has to support callbacks into
> the host language, so that users may define custom subclasses and override
> methods easily.  It has to present a robust ABI that makes it possible to
> recompile an updated core without breaking compiled extensions (like Java,
> unlike C++).  
> The present implementation of the Boilerplater compiler is a collection of
> Perl modules: Boilerplater::Type, Boilerplater::Variable,
> Boilerplater::Method, Boilerplater::Class, and so on.  One CPAN module is
> required, Parse::RecDescent; however, only core developers will need either
> Perl or Parse::RecDescent, since public distributions of Lucy will 
> contain pre-generated code.  Some of Boilerplater's modules have kludgy 
> internals, but on the whole they seem to do a good job of throwing errors 
> rather 
> than failing subtly.
> I expect to submit individual Boilerplater modules using JIRA sub-issues which
> reference this one, to allow room for adequate commentary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCY-5) Boilerplater compiler

Reply via email to