Dave Long <dave.l...@bluewin.ch> writes: > A design that I'd like to try sometime would be to try an "Elephant"- > like language, in which all values would, in principle, be > recoverable from a logfile. This could eliminate GC, as frequently > used values would tend to be instantiated in a cache, and values that > are never referenced again would cost just their portion of storage, > presumably amortized by bulk sequential i/o patterns.
This is a fascinating connection! (For the readers who don’t know what Dave’s talking about, Elephant was John McCarthy’s recent proposal for a really different kind of programming system; as I recall, the basic idea is that you specify the kinds of promises your program has to keep, and the system figures out how to keep those promises based on a complete log of the past I/O behavior of your program.) I’ve been thinking a bit over the years about logfile-based programming as well; the “rumor-oriented programming” post was about it. (The basic theory is that if you write your application to not be sensitive to the order of items in the logfile and if there’s no mutual-exclusion validity criterion for logfiles, you automatically get an application-agnostic synchronization protocol, and you forever avoid the problem of database schema migration.) My theory was that you could define the objects of interest to your application in terms of queries on the history, in a FRP-like way. (Although I was thinking SQL, not Haskell.) But the connection between GC and paging sounds really fruitful. Suppose you chose to eliminate GC by telling the OS that the pages you allocated for results derived from history didn’t need to be preserved at all; if the OS wanted to evict them, it could do so merely by deleting them from your mapping and zeroing them for another process to use. Now if you try to access them, you get a segfault, and the segfault handler can put the requesting thread to sleep and enqueue a query to recompute the data that it had wanted. So you rely on the OS’s paging mechanism to manage the caching of your computational results. Is that more or less what you have in mind? This way, the usual overhead of checking weak-reference validity, or sending weak-reference notifications when finalizing a weakly-referenced object, gets shoved off on the MMU hardware. It seems like it might have poor space efficiency: you might end up with 100 pages that stay alive because they contain 150 frequently-used objects, each of which is only, say, 32 bytes, while the rest of those pages is taken up by dead objects. In the context of parsing, I was thinking that references from the stack (from function arguments or partly evaluated expressions) would be strong references that would promote objects to older generations; does that generalize to the paging scheme? You could perhaps allocate space for the values of subexpressions on the stack so they don’t get deallocated before you use them, then move them to the heap once they were no longer part of a live subexpression or function argument, but it seems like somewhere you need to store away the context you need to recreate those results if you need them again after they get recycled. In the PEG-parsing case, that information is just a position in the source text and a reference to a nonterminal. (The source text and nonterminals don’t need to be garbage-collectable.) But in an Elephant-like system it seems that your query could be some arbitrarily complex expression. Can you guarantee not needing GC for the query too? Alternatively you could realize your Elephant idea using a generational GC in the same way I was suggesting for caching of PEG results, with a more traditional implementation of strong and weak references. > Any idea how mutation rates compare for various workloads with main > memory and/or disk bandwidths? I don’t even know where to begin. Maybe some of Ben Zorn’s quantitative research on garbage collection and allocator performance might come close? Kragen -- To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss