Re: [kragen@canonical.org: Re: PEGs]

Kragen Sitaker Thu, 01 Jan 2009 18:43:16 -0800

Dave Long <dave.l...@bluewin.ch> writes:
> A design that I'd like to try sometime would be to try an "Elephant"- 
> like language, in which all values would, in principle, be  
> recoverable from a logfile.  This could eliminate GC, as frequently  
> used values would tend to be instantiated in a cache, and values that  
> are never referenced again would cost just their portion of storage,  
> presumably amortized by bulk sequential i/o patterns.


This is a fascinating connection!  

(For the readers who don’t know what Dave’s talking about, Elephant
was John McCarthy’s recent proposal for a really different kind of
programming system; as I recall, the basic idea is that you specify
the kinds of promises your program has to keep, and the system figures
out how to keep those promises based on a complete log of the past I/O
behavior of your program.)

I’ve been thinking a bit over the years about logfile-based
programming as well; the “rumor-oriented programming” post was about
it.  (The basic theory is that if you write your application to not be
sensitive to the order of items in the logfile and if there’s no
mutual-exclusion validity criterion for logfiles, you automatically
get an application-agnostic synchronization protocol, and you forever
avoid the problem of database schema migration.)  My theory was that
you could define the objects of interest to your application in terms
of queries on the history, in a FRP-like way.  (Although I was
thinking SQL, not Haskell.)

But the connection between GC and paging sounds really fruitful.
Suppose you chose to eliminate GC by telling the OS that the pages you
allocated for results derived from history didn’t need to be preserved
at all; if the OS wanted to evict them, it could do so merely by
deleting them from your mapping and zeroing them for another process
to use.  Now if you try to access them, you get a segfault, and the
segfault handler can put the requesting thread to sleep and enqueue a
query to recompute the data that it had wanted.  So you rely on the
OS’s paging mechanism to manage the caching of your computational
results.  Is that more or less what you have in mind?

This way, the usual overhead of checking weak-reference validity, or
sending weak-reference notifications when finalizing a
weakly-referenced object, gets shoved off on the MMU hardware.

It seems like it might have poor space efficiency: you might end up
with 100 pages that stay alive because they contain 150
frequently-used objects, each of which is only, say, 32 bytes, while
the rest of those pages is taken up by dead objects.

In the context of parsing, I was thinking that references from the
stack (from function arguments or partly evaluated expressions) would
be strong references that would promote objects to older generations;
does that generalize to the paging scheme?  You could perhaps allocate
space for the values of subexpressions on the stack so they don’t get
deallocated before you use them, then move them to the heap once they
were no longer part of a live subexpression or function argument, but
it seems like somewhere you need to store away the context you need to
recreate those results if you need them again after they get recycled.
In the PEG-parsing case, that information is just a position in the
source text and a reference to a nonterminal.  (The source text and
nonterminals don’t need to be garbage-collectable.)  But in an
Elephant-like system it seems that your query could be some
arbitrarily complex expression.  Can you guarantee not needing GC for
the query too?

Alternatively you could realize your Elephant idea using a
generational GC in the same way I was suggesting for caching of PEG
results, with a more traditional implementation of strong and weak
references.

> Any idea how mutation rates compare for various workloads with main  
> memory and/or disk bandwidths?

I don’t even know where to begin.  Maybe some of Ben Zorn’s
quantitative research on garbage collection and allocator performance
might come close?

Kragen
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Re: [kragen@canonical.org: Re: PEGs]

Reply via email to