Re: [lucy-dev] autogen dir

Marvin Humphrey Mon, 27 May 2013 21:11:32 -0700

On Sat, May 25, 2013 at 6:17 AM, Nick Wellnhofer <[email protected]> wrote:
> What's the rationale for fully qualified parcel namespaces exactly? Is it to
> work around possible name clashes of parcel prefixes?


Yes, it's to make our namespacing mechanism more robust.  However, it's not
that Clownfish should *force* users into lengthy parcel names -- its that we
should support namespaces properly and give users a choice.  Even after we
enable nested parcel names, people can still select simple names, or avoid
explicit parcels altogether.  Using reversed domain names is only a
convention.

Prefixes are flawed, because anything more than a few characters results in
symbols which are unacceptably cumbersome to type, but the limited length
makes clashes more likely.  For example, when we were considering what
Clownfish ought to use as a prefix, we had to take into account that "CF" is
used as a prefix by Apple's "Core Foundation" classes.

We're already committed to providing short name aliases for the sake of
programmer convenience.  We may as well go one step further and build out
namespacing which remains just as user-friendly yet is more resistant to
symbol clashes.

*   Individual symbol aliases, which are typed multiple times within source
    files, should be short.
*   Imports, which happen only once per file, may be long.
*   Real names for symbols may be long, since they can remain hidden behind
    aliases most of the time.

To support namespaces properly in the Clownfish internals, we need to ensure
that we don't lock systems into place that depend on prefixes within global
contexts -- which is why the commit in question drew my attention.  We were
already doing doing something similar elsewhere -- the "boot" files e.g.
"lucy_boot.c" and "lucy_boot.h" -- and I was the one who wrote that code.  But
those were either a bug or a TODO (take your pick) -- and rather than compound
the mistake, we should fix it... by differentiating autogenerated files using
directory structures rather than file name prefixes.

However, I would like to suggest a tweak.  A common complaint in Java-land is
that the reverse-domain package naming convention results in too deep a
directory hierarchy.  The extra depth is not a huge deal for installed files,
but it's a pain when interacting with source trees.  I think we can solve this
by having CFC allow .cfp parcel files to establish the namespace for files in
lower directories:

    // This...
    $CORE/
          foo.cfp          // com::example::foo
    $CORE/foo/
              MyClass.cfh  // com::example::foo::MyClass

    // Not this...
    $CORE/com/example/
                      foo.cfp
    $CORE/com/example/foo/
                          MyClass.cfh

Inside CFC, we should simplify things by using a single symbol table for both
parcels and classes, so that class names are prefixed by the names of the
parcels they live under.  One consequence is that Clownfish class names will
no longer map one-to-one onto Perl package names, so we'll have to perform
per-host mapping.  But we were going to have to do that anyway for other hosts
like Python, where module names are lowercase by convention and '.' is used as
a package separator instead of the double colon.

    Clownfish:  org::apache::lucy::search::IndexSearcher
    Perl:       Lucy::Search::IndexSearcher
    Python:     lucy.search.IndexSearcher

(Aside: We may want to go with '.' instead of '::' ourselves.)

In terms of alias generation, here's what I think we should be doing:

    #define lucy_Indexer_new org_apache_lucy_Indexer_new
    #ifdef LUCY_USE_SHORT_NAMES
        #define Indexer_new org_apache_lucy_Indexer_new
    #endif

(Another aside: perhaps we should enable short names by default and replace
`LUCY_USE_SHORT_NAMES` with `LUCY_NO_SHORT_ALIASES`.)

>From the perspective of a programmer working with Clownfish, everything in a
parcel should be available with via a single pound-include:

    #include "org/apache/lucy.h"

The programmer then uses the parcel prefix if there may be clashes (as we have
to when programming in files which pound-include most host language C
headers), or uses the short names when there are no conflicts (as we do when
programming in standard C environment).  There's no difference from today as
far as programming; in our case, we won't have to change any of our search
engine code.  However, instead of being a real symbol, `lucy_Indexer_new`
would be an alias -- just like the much more commonly used `Indexer_new` is
already.

> I also don't want to put more internal stuff in the installed headers. They
> already take up considerable space. Here's an example of the footprint of a
> C library installation on OS X:
>
>     $ du -sch lucy/*
>     6.3M        lucy/include
>     2.4M        lucy/lib
>     372K        lucy/man
>     9.1M        total
>
> It's not really a problem but I find it interesting that the headers take up
> more than two times the space of the binary. They're even more than three
> times the size of the stripped binary.

FWIW, there's definitely some bloat in those headers.

Also, we could address your concern about embedded C code taking up too much
space in the headers by generating a file called e.g. "parcel.impl" which gets
pulled in conditionally:

    #ifdef P_ORG_APACHE_LUCY
      #include "org/apache/lucy/parcel.impl"
    #endif

But these are implementation details.

Marvin Humphrey

Re: [lucy-dev] autogen dir

Reply via email to