Sean Cavanaugh <worksonmymach...@gmail.com> changed:
What |Removed |Added
--- Comment #127 from Sean Cavanaugh <worksonmymach...@gmail.com> 2011-04-15
01:40:35 PDT ---
(In reply to comment #120)
> (In reply to comment #118)
> > True, and it works tolerably well. To do a moving gc, however, you need more
> > precise information.
> I don't want a moving GC. I want a fast GC.
> ("I" in this context means D users with the same requirements, mainly video
> game developers.)
> I understand the advantages of a moving GC - heap compaction allowing for an
> overall smaller managed heap etc., but I hope you understand that sacrificing
> speed for these goals is not an unilateral improvement for everyone.
I am a game developer, and this thread is fairly fascinating to me, as memory
management and good support for Intel SSE2(and AVX) or PowerPC VMX are two of
the biggest issues to me when considering alternative languages or the question
of 'will this language be suitable in the future'. The SSE problem seems
workable with extern C'd C++ DLLs code to handle the heavy math, which leaves
the GC as a big 'what does this mean' when evaluating the landscape.
The reality is a lot of game engines allocate a surprising amount of memory at
run time. The speed of malloc itself is rarely an issue as most searches take
reasonably similar amount of time. The real problems with heavy use of malloc
are thread lock contention in the allocator, and fragmentation. Fragmentation
causes two problems: large allocation failures when memory is low (say 1 MB
allocation when 30 MB is 'free'), and virtual pages are unable to be reclaimed
due to a stray allocation or two within the page.
Lock contention is solved by making separate heaps. Fragmentation is fought
also fought by separating the heaps, but organizing the allocations coherently
either time-wise or by allocation type where like sized objects pooled into a
special pool for objects of that size. As a bonus fixed size object pools have
const time for allocation, except when the pool has to grow, but we try real
hard to pre-size these to the worst case values. On my last project we had
about 8 dlmalloc based heaps and 15 fixed sized allocator pools, to solve these
I would greatly prefer a GC to compact the heap to keep the peak memory down,
because in embeded (console) environments memory is a constant but time is
fungible. VM might be available on the environments, but it isn't going to be
backed by disk. Instead the idea of the VM is that it is a tool to fight
fragmentation of the underlying physical pages, and to help you get contiguous
space to work with. There is also pressure to use larger (64k, 1MB, 4MB pages)
pages to keep the TLB lookups fast, which hurts even more with fragmentation.
Tiny allocations holding onto these big pages prevents them from being
reclaimed, which makes getting those allocations moved somewhere better pretty
Now the good news is a huge amount of resources in a game do not need to be
allocated into a garbage collected space. For the most part anything you send
to the GPU data is far better off being written into its memory system and left
alone. Physics data and Audio data have similar behaviors for the most part
and can be allocated through malloc or aligned forms of malloc (for SSE
So from a game's developers point of I need to know when the GC will run either
by configuration or by manually driving it. Both allow me to run a frame with
most of the AI and physics disabled to give more of the time to the collector.
A panic execution GC pass that I wasn't expecting is acceptable, provided I get
notified of it, as I would expect this to be an indicator memory is getting
tight to the point an Out of Memory is imminent. A panic GC is a QA problem as
we can tell them where and how often the are occurring and they can in turn
tell the designers making the art/levels that they need to start trimming the
memory usage a bit.
Ideally the GC would be able to run in less time than a single frame (say
10-15ms for a 30fps game). Taking away some amount of time every frame is also
acceptable. For example spending 1ms of every frame to do 1ms worth of data
movement or analysis for compacting would be a reasonable thing to allow, even
if it was in addition to the multi-millisecond spikes at some time interval (30
frames, 30 seconds whatever). Making the whole thing friendly to having lots
of CPU cores wouldn't hurt either.
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------