On 11/15/2010 08:45 PM, Daniel Vetter wrote: > Hi Thomas, > > On Mon, Nov 15, 2010 at 08:58:13AM +0100, Thomas Hellstrom wrote: > >> Nice work, although I have some comments about general applicability >> that we perhaps need to think about. >> >> 1) The space representation and space allocation algorithm is >> something that is private to the aperture management system. For a >> specialized implementation like i915 that is all fine, but Ben has >> recently abstracted that part out of the core TTM bo implementation. >> As an example, vmwgfx is now using kernel idas to manage aperture >> space, and drm_mm objects for traditional VRAM space. Hence, >> embedding drm_mm objects into ttm bos will not really be worthwile. >> At least not for aperture space management, and TTM will need to >> continue to "dance", both in the ida case and in the drm_mm case. >> > Yep, I've looked into this and noticed the recent addition of the ida > support. This is why I've added the "decent surgery" comment. Embedding > the drm_mm_node still looks possible, albeit perhaps not feasible (at > least I won't tackle this in the immediate future). >
Indeed, it's possible but for drivers that don't use it, it will sit unused. >> 2) The algorithm used by drm_mm has been around for a while and has >> seen a fair amount of changes, but nobody has yet attacked the >> algorithm used to search for free space, which was just quickly put >> together as an improvement on what was the old mesa range manager. >> In moderate fragmentation situations, the performance will degrade, >> particularly with "best match" searches. In the near future we'd >> probably want to add something like a "hole rb tree" rather than a >> "hole list", and a choice of algorithm for the user. With embeddable >> objects, unless you want to waste space for unused members, you'd >> need a separate drm_mm node subclass for each algorithm, whereas if >> you don't embed, you only need to allocate what you need. >> > First a small rant about "best match" (to get it out of the way;-) > - "best match" is simply a misleading name: with alignment> size > (at least on older hw) and mixes of unrestricted and range restricted > allocations (ironlake has 2G of gtt, just 256 of it mappable), which is > all possible with the latest experimental i915 patches, "best match" can > do worse than the simpler approach. > - doing a full linear scan for every tiny state buffer/pixmap cache is > slow. > At this, it serves as an excuse to not implement proper eviction support. > </rant> > [If you agree, I'll happily write the patch to rip it out. It just doesn't > bother me 'cause it's only a few lines in drm_mm.c and I can ignore the > actual users.] > > Now to the more useful discussion: IMHO drm_mm.c should be an allocator > for vram/(g)tt, i.e. it needs to support: > - a mix of large/small sizes. > - fancy alignment constrains (new patches for drm/i915 are pushing things > there). > - range-restricted allocations. I think current users only ever have one > (start, end) set for restricted allocations, so this might actually be > simplified. > If other users don't fit into this anymore, mea culpa, they need they're > own allocator. You've already taken this path for vmwgfx by using the ida > allocator. And if the linear scan for the gem mmap offset allocator ever > shows up in profiles, I think it's better served with a buddy-style > pot-sized, pot-aligned allocator. After all, fragmentation of virtual > address space isn't a that severe problem. > I must admit I haven't done detailed benchmarking, particularly not for cases with a _huge_ number of bo's but I'm prepared to accept the fact that "first match" gives good enough results. For the mmap offset fragmentation it's less of a problem since it should be straightforward to move bos in the address space with an additional translation of the user-space offset (if ever needed). > Hence I think that drivers with extremely specific needs should roll their > own allocator. So I don't think we should anticipate different allocator > algorithms. I see driver-specific stuff more in the area of clever > eviction algorithms - i915 is currently at 5 lru's for gtt mapped bos, and > we're still adding. > > > Yes, I agree. My point was merely that one should think twice before embedding drm_mm objects in generic buffer objects intended also for drivers with special needs. > To make a long story short, I've opted to make the current code faster by > avoiding kmalloc and spoiling fewer cache-lines with useless data. And if > the linear scan ever shows up in profiles, we could always add some stats > to bail out early for large allocations. Or add a tree to heuristically > find a suitable hole (assuming worst-case waste due to alignment). > > > Thanks a lot for your input on this. > > Yours, Daniel > Thanks, Thomas