found 692053 10.0.11esr-1
notfound 692053 10.0~b2-1
# that version is over; nobody wants to patch it anymore, I guess
severity 692053 serious
tags 692053 + patch
thanks


I could reproduce the bug on Wheezy.

Iceweasel also needs a patch for the address range problem on ia64 again since the upstream development has removed it in the meantime. I filed the separate bug#696041 for that.
After applying the patch of bug#696041, Iceweasel still used to hang.

It turned out that the garbage collector has two bugs that can cause an endless loop.

The problem is in js/src/jsgc.cpp:2385 in the function DecommitFreePages():

static void
DecommitFreePages(JSContext *cx)
{
    JSRuntime *rt = cx->runtime;

for (GCChunkSet::Range r(rt->gcChunkSet.all()); !r.empty(); r.popFront()) {
        Chunk *chunk = r.front();
        while (chunk) {
ArenaHeader *aheader = static_cast<ArenaHeader*>(chunk->info.freeArenasHead);

            /*
             * In the non-failure case, the list will be gone at the end of
             * the loop. In the case where we fail, we relink all failed
             * decommits into a new list on freeArenasHead.
             */
            chunk->info.freeArenasHead = NULL;

            while (aheader) {
                /* Store aside everything we will need after decommit. */
                ArenaHeader *next = aheader->next;

                bool success = DecommitMemory(aheader, ArenaSize);
                if (!success) {
                    aheader->next = chunk->info.freeArenasHead;
                    chunk->info.freeArenasHead = aheader;
                    continue;
                }

size_t arenaOffset = Chunk::arenaIndex(reinterpret_cast<uintptr_t>(aheader));
                chunk->decommittedArenas.set(arenaOffset);
                --chunk->info.numArenasFreeCommitted;
                --rt->gcNumFreeArenas;

                aheader = next;
            }

            chunk = chunk->info.next;
        }
    }
}


This code calls a function DecommitMemory(); ArenaSize is constant 4096, defined in js/src/jsgc.h:

const size_t ArenaShift = 12;
const size_t ArenaSize = size_t(1) << ArenaShift;
const size_t ArenaMask = ArenaSize - 1;

The DecommitMemory() is in js/src/jsgcchunk.cpp:

bool
DecommitMemory(void *addr, size_t size)
{
    JS_ASSERT(uintptr_t(addr) % 4096UL == 0);
    int result = madvise(addr, size, MADV_DONTNEED);
    return result != -1;
}




When I debugged the hanging Iceweasel, a thread was in the most inner while loop in DecommitFreePages()

ArenaHeader *aheader = static_cast<ArenaHeader*>(chunk->info.freeArenasHead);

            /*
             * In the non-failure case, the list will be gone at the end of
             * the loop. In the case where we fail, we relink all failed
             * decommits into a new list on freeArenasHead.
             */
            chunk->info.freeArenasHead = NULL;

            while (aheader) {
                /* Store aside everything we will need after decommit. */
                ArenaHeader *next = aheader->next;

                bool success = DecommitMemory(aheader, ArenaSize);
                if (!success) {
                    aheader->next = chunk->info.freeArenasHead;
                    chunk->info.freeArenasHead = aheader;
                    continue;
                }

size_t arenaOffset = Chunk::arenaIndex(reinterpret_cast<uintptr_t>(aheader));
                chunk->decommittedArenas.set(arenaOffset);
                --chunk->info.numArenasFreeCommitted;
                --rt->gcNumFreeArenas;

                aheader = next;
            }

aheader was always 0x70011e0100 on each loop iteration since aheader->next was 0x70011e0100, too.
The question was how this state could occur.

Assume that we are in the first iteration of the loop. aheader is initialized with the ptr of chunk->info.freeArenasHead. DecommitMemory is called, for example, with the address 0x70011e0100 and the size 4096 (which is ArenaSize). DecommitMemory() will fail since madvise() requires a page aligned addr and length

int madvise(void *addr, size_t length, int advice);

(madvise() returns the EINVAL error code.)

Neither the seen address 0x70011e0100 nor the length 4096 is page aligned here. Remember that Linux on ia64 can have 4K, 8K, 16K, or 64K page dependant on the configuration the Kernel was compiled with. Debian uses 16K on ia64.
So the subsequent if statement executes its body:

                if (!success) {
                    aheader->next = chunk->info.freeArenasHead;
                    chunk->info.freeArenasHead = aheader;
                    continue;
                }

Here happens what the above comment explains: "...we relink all failed decommits into a new list on freeArenasHead.". aheader is inserted as the new head item in the singly linked list which starts with chunk->info.freeArenasHead. Please note that chunk->info.freeArenasHead has been cleared above (initialized with NULL). aheader->next is initialized with chunk->info.freeArenasHead, which is NULL this time; chunk->info.freeArenasHead is aheader subsequently. Not any problem has occurred so far. The continue statement enters the next iteration of the while loop - without running the final
                aheader = next;
DecommitMemory() is called with the same aheader again and of course fails again. The body of the if statement runs for the aheader again. This time aheader->next is initialized with chunk->info.freeArenasHead which isn't NULL this time; it is the former aheader.
aheader->next is equal to aheader after that. The hang up is complete.


So the considered code has two bugs:
- The garbage collector code gives madvise() non-page-aligned addresses and lengths since the code assumes 4K page size in hard-coded manner. Thus, the problem doesn' occur, for example, on x86 or amd64. - The while loop doesn't correctly iterate forward in the case of a fail of the memory decommit.

Initially I wanted the garbage collector code use the actual memory page size, for example, the constants ...
const size_t ArenaShift = 12;
const size_t ArenaSize = size_t(1) << ArenaShift;
const size_t ArenaMask = ArenaSize - 1;

/*
 * This is the maximum number of arenas we allow in the FreeCommitted state
 * before we trigger a GC_SHRINK to release free arenas to the OS.
 */
const static uint32 MaxFreeCommittedArenas = (32 << 20) / ArenaSize;

/*
 * The mark bitmap has one bit per each GC cell. For multi-cell GC things this
 * wastes space but allows to avoid expensive devisions by thing's size when
 * accessing the bitmap. In addition this allows to use some bits for colored
 * marking during the cycle GC.
 */
const size_t ArenaCellCount = size_t(1) << (ArenaShift - Cell::CellShift);
const size_t ArenaBitmapBits = ArenaCellCount;
const size_t ArenaBitmapBytes = ArenaBitmapBits / 8;
const size_t ArenaBitmapWords = ArenaBitmapBits / JS_BITS_PER_WORD;

... should no longer be constants. I think it would be much work.

Furthermore, I found the following code in js/src/jsgc.h:152

    /*
     * To minimize the size of the arena header the first span is encoded
     * there as offsets from the arena start.
     */
    static size_t encodeOffsets(size_t firstOffset, size_t lastOffset) {
        /* Check that we can pack the offsets into uint16. */
        JS_STATIC_ASSERT(ArenaShift < 16);
        JS_ASSERT(firstOffset <= ArenaSize);
        JS_ASSERT(lastOffset < ArenaSize);
        JS_ASSERT(firstOffset <= ((lastOffset + 1) & ~size_t(1)));
        return firstOffset | (lastOffset << 16);
    }

The
        JS_STATIC_ASSERT(ArenaShift < 16);
means that it wouldn't work with 64K memory page size. I guess my idea would mean a complete rewrite of the code.

Thus, a simplier fix:
- The while loop is fixed in order to iterate correctly in the case of a fail of the memory decommit. - The code continues using 4K page size in hard-coded manner. Since all memory decommits would fail, the chunk->info.freeArenasHead singly linked list would contain all the items it had before. Thus, we insert at the begin of the DecommitFreePages() function (the one with the while loop) the following code which skips the loops when the memory page size is larger than 4K:
    /*
     * If the ArenaSize is smaller than a memory page, we don't attempt to
     * decommit anything because DecommitMemory() would always fail.
     */
    if (ArenaSize < pageSize())
        return;


The drawback is that Iceweasel would keep more memory (just as earlier versions of Iceweasel) on ia64. I think it isn't that bad.


You also need the patches of bug#696041.


Stephan

Attachment: fix-gc-hang-on-large-pages.patch
Description: fix-gc-hang-on-large-pages.patch

Reply via email to