Re: Question regarding structural sharing, records, and maps

Ken Wesson Fri, 29 Jul 2011 04:38:31 -0700

On Thu, Jul 28, 2011 at 11:59 PM, Trenton Strong
<trenton.str...@gmail.com> wrote:
> 1.) Does structural sharing play well with nested structures? With a
> tree whose nodes are represented by nested maps, if a leaf node is
> updated with new data, will structural sharing efficiently represent
> the new version of the tree as "all the rest of the tree" + modified
> node?


Yes. In fact this should work with records, too -- all the "native"
fields of the record need to be copied for each version, but most will
typically be pointers and most of those will typically be to existing
objects. Only the ones that were changed will point to new nested
objects. So unless your records have huge numbers of predefined fields
there shouldn't be much time or memory cost in copying them.

> 3.) Are there alternatives to a single ref around the root node of the
> tree that are worth exploring?

Probably. Any two changes to the tree will cause one to retry in the
single-ref case, unless you're sure all the changes will commute.
(Changes to distinct fields where one change doesn't depend on what
the other changes will commute, and changes to the same field that
don't matter as to sequencing, such as inc and dec, will commute, but
otherwise no.)

You will probably want something structured more like a database, with
"tables" that are maps and "rows" that are records or more complex,
nested structures, such that most concurrent changes will not affect a
common row. The "tables" can then be refs of maps of refs, with most
changes operating on the row-refs. The table refs only need to change
if rows are inserted or deleted; those could be bottlenecks, though if
all objects have GUIDs as keys in these tables then row insertions and
deletions will commute, leaving only the GUID generators for each
table.

Unfortunately, (let [new-id (commute table-x-guid-counter inc)])
sounds nice but might result in new-id taking on the same value in two
concurrent transactions, so you'll need to use alter and GUID
generation may be a bottleneck. But it may be much faster than your
other, more complex transactions. Be sure to do it separately, e.g.:

(let [new-id (dosync (alter table-x-guid-counter inc))]
  (dosync
    (commute table-x assoc new-id some-thingy)))

You can even use atoms for the guid counters instead of refs, and then
the ! in swap! will remind you to get the ID before entering a
transaction. Do as much of what's needed to generate some-thingy
before the transaction, too -- though if it depends on other refs'
values, and those values should be current rather than it being
acceptable for them to be values that were there recently but might
have changed, then some of the work will need to be done in the
transaction.

> I've considered wrapping each node in
> a ref and performing updates that way, or possibly taking the async
> route and dispatching cals using agents, but my experience there is
> nil.  Any insight there?

I'd avoid having refs nesting more than two deep. Complex arrangements
of cross-references among nodes can create a giant headache in
combination with refs and transactions, so using a "database" approach
with "table" maps, ID token keys, and cross-references to ID tokens
rather than objects is often sensible. In particular, complex objects
with nested refs effectively have mutable state inside them, which is
usually not desired in Clojure. (One can argue, though, that the ref
object is not different in principle from the ID token and removes a
layer of indirection, and possibly obviates the need to generate IDs
from counters that act as bottlenecks -- though, really, the Java new
operator's allocation of memory now takes the role of ID generator.
I'm given to understand modern VMs do some clever things to make new
able to be fairly concurrent, though, such as giving each thread its
own subset of the eden generation to allocate in.)

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Question regarding structural sharing, records, and maps

Reply via email to