Re: [nupic-dev] Serialization options

Tim Boudreau Fri, 30 Aug 2013 16:17:05 -0700

On Fri, Aug 30, 2013 at 6:35 PM, Scott Purdy <[email protected]> wrote:


> My intuition is that something like Protocol Buffers/Thrift/Cap'n Proto
> makes backwards-compatibility much easier (relative to something like
> MessagePack or manually writing out checkpoints).  But those methods make
> it a little more difficult to tie the logic to he different parts in an
> object-oriented way.  They are more suited to functional programming, which
> isn't bad but different from what we currently have.
>

IMO, for something like this where you're going to have millions or
billions of small objects, some representing a bit or two, a somewhat
functional style is unavoidable - the alternative is adding a minimum of 8
bytes overhead to every object allocation, which, if you're representing a
couple of bits, is insane.  Also, for maximum performance, you're going to
want to minimize cache misses, which means laying the data out contiguously
in memory - which no language that abstracts memory management is
guaranteed to do for you.

You can do that and still offer users a nice OOP API - the difference just
being that your objects are flyweight - they consist of an offset into an
array of the actual data, and read and write all their state from there -
and most of the time there will only be one such object instance at a time
- you make one, pass it to the caller so they get an object oriented view
of the data, and dispose of it once the caller is done with it, and on to
the next.

Once you've got that, the simple pattern to use is for clients to write
"visitors" - functions (or one-off classes, depending on the language)
which get passed the objects one by one.  The result is similar to an API
for a compiler's ASTs.

If your target audience is really used to array-like collection objects,
you can write something collection-like that that creates objects on the
fly (assuming a garbage collected language or a known lifecycle for the
objects - the visitor pattern makes it easier to scope the lifecycle of
flyweight objects).  That's one you probably don't do unless your target
audience is really struggling with visitors, because it will be harder to
get right and harder to maintain, and harder to keep client code from
leaking memory by leaking objects.

This is one of those cases where "premature optimization is the root of all
evil" needs to go out the window - thinking carefully about the memory
model is a necessity.

-Tim

-- 
http://timboudreau.com

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-dev] Serialization options

Reply via email to