Re: [lucy-dev] tutorial feedback

Marvin Humphrey Thu, 04 Jun 2015 19:43:23 -0700

On Wed, May 13, 2015 at 5:20 AM, Rafael Schloming <[email protected]> wrote:
> My name is Rafael Schloming, I'm one of the proton developers over at
> qpid.apache.org. I've recently been looking into the possibility of using
> clownfish to deal with some of the maintenance issues we have with the
> proton bindings, and Marvin was kind enough to send me an early version of
> a tutorial in exchange for some feedback. He asked me to share it with this
> list, so I've included it below.


Hi Rafael -- thanks so much for providing us with this feedback!  I'm going to
take this in two passes.

1.  In this email, answer your questions.
2.  In another email, consider how the tutorial could be improved.

You've already gotten some of the answers below via direct conversation --
hopefully everything in this email is consistent with what you've heard
before.  :)

> - Poking around briefly after the git clone, I noticed there seems to be a
> very regular directory structure. It left me curious about the model behind
> it.

At the top level of a Clownfish project, there will be a directory called
"core" which holds the C and Clownfish files shared across all host languages.
Next to "core", you will find one directory per supported host language, each
of which follows the conventions of the given host language ecosystem.

> - The multiple build steps piqued my curiosity. Is it actually necessary to
> have multiple build steps for some reason?

The CFC compiler and the Clownfish runtime are separate entities -- just like
cc and libc are separate.

For some hosts, running the build system for the runtime triggers a build of
CFC as well.  We could probably streamline the tutorial by taking advantage of
that and providing only the command sequence to build the runtime.

> - I notice that accessing the string contents as utf8 requires memory
> allocation in this example. This could be an issue for proton given that it
> needs to efficiently round trip utf8 strings on/off of the wire.

It is potentially an issue.  Ideally, Clownfish's Unicode string-handling
classes would use the same encoding of the host language's string type --
which will be UTF-8 on many platforms, but UTF-16 on some.  Under such
circumstances, UTF-8 will only be accessible via export.

Even on UTF-8 platforms, the raw UTF-8 data in Clownfish Strings is not
guaranteed to be NUL-terminated.  I don't know if Proton requires
NUL-termination or not.

> - I notice that the type name and method prefix don't match for String.
> Proton has a pretty strong convention of using pn_<class>_t and
> pn_<class>_method.

Those "nicknames" -- like `Vec` for Vector, `Str` for String` -- are optional.
However, names in Clownfish-flavored C must follow certain capitalization
conventions.

Supporting other naming conventions can probably be achieved via static inline
wrappers, etc.

> - Why do I need to know/care about the difference between dynamic and inert
> calls when I'm making them? What if I change from one to the other? is
> there something that prevents me from doing this or do I need to decide
> what will be dynamic vs inert from the beginning?

Dynamic methods are instance methods and can be overridden in subclasses.
Inert functions are static globals, do not require an invocant, and cannot be
overridden.

> - What does hello_bootstrap_parcel() do?

Clownfish-powered libraries need to be initialized at runtime before they can
be used.

> - Where does GREETER come from?

That's a class object.  It's defined in code generated by CFC upon consuming
Greeter.cfh.

> - The utf8 access pattern seems to incur extra cognitive overhead for
> memory management.

I hear you.  One option would be for Clownfish Strings to cache the
NUL-terminated UTF-8 representation rather than hand it off to the user -- a
feature offered by Python's C API.

    https://docs.python.org/3.4/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize

    This caches the UTF-8 representation of the string in the Unicode object,
    and subsequent calls will return a pointer to the same buffer. The caller
    is not responsible for deallocating the buffer.

When using UTF-8 as the internal encoding for String, we could often reuse the
main buffer, incurring no additional cost.

> - Can I use a single mutable string object and pass it in as an out
> parameter?

Clownfish does provide a mutable string type -- CharBuf.  Like String, though,
it's conceived as using the same encoding as the host string type -- so it may
not address your concern about avoiding duplication.

There's also ByteBuf, a mutable type for manipulating raw binary data.  It
won't help you with enforcing Unicode consistency, but it might provide the
necessary flexibility.

> - Do I actually have to clean up the C to build the perl?

Yes -- this is necessary because when we compile the code in "core", we land
each object file next to its source file -- yet the files get compiled
differently depending on the host language environment.

I suppose we could fix this by taking inspiration from Proton and landing
those object files in a dedicated build dir -- one for each host.

Marvin Humphrey

Re: [lucy-dev] tutorial feedback

Reply via email to