Hi all, After several days immersion in the garbage collector code, I fixed the weak symbol GC. Originally, because rewriting code is easier than understanding it, I wrote a patch that refactors and considerably simplifies the GC by removing the whole weak symbol table kerfuffle. I still think this is the way forward, but since it subtly changes how symbols need to be handled, I think it's better to delay this for CHICKEN 5.
While preparing said patch I actually did build up an understanding about what was going wrong in the first place, with the weak reference table. The attached patch explains and fixes the situation, and I also explain it a bit in the ticket, but let's try again in other words: First, during minor GC and reallocing GC, the weak symbol table is ignored, so symbols and buckets (the "pairs" that form the linked list chain in the symbol hash table buckets) are simply treated as regular Scheme objects, no problem. So we're dealing only with major GC. There's special casing going on here. The basic idea is to perform a reference count of the symbol, and if only one reference exists (i.e., the global symbol table), we can drop the symbol. In slightly more detail, really_mark() has these special-cased actions for symbols and buckets: - If we are marking a symbol, something referred to it, so we look it up in the weak item table and, if found, increment its counter by one (to a maximum of 2). If the symbol has no entry, we don't do anything. - If we are marking a bucket containing a symbol, we create an entry in the weak item table for the symbol with a counter of 0, unless the symbol inside was already forwarded: then the counter is "saturated" at 2. That's because we obviously already marked the symbol, so we missed the previous bullet an unknown number of times. - After the major GC, walk the entire symbol table and check if it has a corresponding entry in the weak item table with a counter that's less than 2. If it does, and the symbol has no global binding or plist, we can drop it: only one reference exists (the symbol itself). Unfortunately, this doesn't work in all cases. The reason is that when we store the item in the forwarding table, we store the pointer that's inside the bucket's first slot. Because we are inside a major GC, this pointer is typically a heap (fromspace) pointer, even if the symbol was previously moved from the nursery during minor GC. Let's say the bucket's slot was also updated itself during the minor GC, so the bucket itself refers to fromspace as well. But, there may be other unmarked references which still refer to the symbol's original location in the nursery. Because the major GC step 1 above checks the item's pointer before chasing forwarding pointers, it will not even see that we're marking a symbol because the header bits are for a forwarding pointer, not a symbol. Alternatively, the reference may be to the symbol's location in the fromspace, but the symbol has already moved to the tospace. The patch "simply" adds another check of the header type after chasing the forwarding pointer. If it's a symbol, we look up the *original* location in the weak table and increment the counter. This also happens if there are two forwarding pointers, then we need to look up the original pointer *or* the intermediate pointer in the weak table. This is all extremely fiddly, hard to understand and even if you understand it (somewhat), it's hard to explain too (it is *very* clever and impressive code, though!). That's why I think we're better off refactoring the whole thing, for which a patch will follow. Besides, after *that* patch, I think we can get rid of the special BUCKET_TYPE, which means we can free up a reserved tag number, re: the discussion about my extended numbers refactoring patch. But first, this patch should be applied to master. It also applies cleanly to the chicken-5 branch, so I think to preserve sanity in the meantime we should apply it to both branches: having a change only in master is too confusing. PS: Testing this seems tricky, but a simple way is to simply compile with "SYMBOLGC=1" on make, which will unconditionally enable weak symbol GC. Go ahead, try it with an unpatched CHICKEN and watch the test suite crash and burn :) PPS: The "knucleotide" benchmark is a great one to test performance of weak symbol GC, it creates boatloads of symbols. Cheers, Peter
From fd56a734ea4249db9f41f9c8296e4434d12a325f Mon Sep 17 00:00:00 2001 From: Peter Bex <[email protected]> Date: Wed, 24 Aug 2016 21:04:37 +0200 Subject: [PATCH] Fix symbol GC: add wep lookup after fptr chasing Sometimes, with symbol GC enabled, a major GC might "drop" symbols which were still being referenced, resulting in weird errors like (eq? x 'foo) returning #f even if x holds the symbol 'foo. If, during marking in major GC, we encounter the bucket before we encounter the symbol, the bucket still refers to the symbol in its original location (the fromspace). This pointer is added to the weak table with a counter of 0. Then, the symbol itself is scanned, and the item is found in the weak table, the counter is updated and the symbol is moved to the heap. The header at the symbol's original location in the fromspace becomes a forwarding pointer. Then, when we encounter a _second_ reference to the symbol, it still refers to the symbol's pointer in the fromspace, but the header will be a forwarding pointer, so it won't match the symbol type (which we look for right at the start of the mark function). This means the code code to update the weak entry's count won't be triggered. Instead, we should chase the forwarded pointer and *then* check if it's a symbol. If it is, look up the *original* location's pointer in the weak table. Note: We don't need to look up the new location, because that can only be the case if the symbol was marked before we encountered the bucket, in which case it will already saturate the pointer immediately upon insertion of the weak table entry. Note 2: Before a reallocing GC, we reset the weak table and we never consult it during the reallocing GC, so all symbols will be copied. A minor GC also doesn't handle symbols specially, so they'll be copied there too. This fixes #1173 --- NEWS | 1 + runtime.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/NEWS b/NEWS index d64c818..0481245 100644 --- a/NEWS +++ b/NEWS @@ -23,6 +23,7 @@ which is faster because it is inlined (#1260, thanks to Kooda). - The default error handler now truncates very long condition messages (thanks to Lemonboy). + - Weak symbol GC (-:w) no longer drops random symbols (#1173). - Syntax expander - DSSSL lambda lists have improved hygiene, so they don't need diff --git a/runtime.c b/runtime.c index cdaaa0e..56379ff 100644 --- a/runtime.c +++ b/runtime.c @@ -3267,6 +3267,15 @@ C_regparm void C_fcall really_mark(C_word *x) if(is_fptr(h)) { val = fptr_to_ptr(h); + /* When we marked the bucket, it may have already referred to + * the moved symbol instead of its original location. Re-check: + */ + if(C_enable_gcweak && + (C_block_header(val) & C_HEADER_TYPE_BITS) == C_SYMBOL_TYPE && + (wep = lookup_weak_table_entry(*x, 0)) != NULL) { + if((wep->container & WEAK_COUNTER_MAX) == 0) ++wep->container; + } + if((C_uword)val >= (C_uword)tospace_start && (C_uword)val < (C_uword)tospace_top) { *x = val; return; @@ -3280,6 +3289,15 @@ C_regparm void C_fcall really_mark(C_word *x) /* Link points into fromspace and into a link which points into from- or tospace: */ val = fptr_to_ptr(h); + /* See above: might happen twice */ + if(C_enable_gcweak && + (C_block_header(val) & C_HEADER_TYPE_BITS) == C_SYMBOL_TYPE && + /* Check both the original and intermediate location: */ + ((wep = lookup_weak_table_entry((C_word)p, 0)) != NULL || + (wep = lookup_weak_table_entry(*x, 0)) != NULL)) { + if((wep->container & WEAK_COUNTER_MAX) == 0) ++wep->container; + } + if((C_uword)val >= (C_uword)tospace_start && (C_uword)val < (C_uword)tospace_top) { *x = val; return; -- 2.1.4
signature.asc
Description: Digital signature
_______________________________________________ Chicken-hackers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-hackers
