* Keith Packard <[EMAIL PROTECTED]> wrote: > On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote: > > > But i think the direction of the new GEM code is subtly wrong here, > > because it tries to manage memory even on 64-bit systems. IMO it > > should just map the _whole_ graphics aperture (non-cached) and be > > done with it. There's no faster method at managing pages than the > > CPU doing a TLB fill from pagetables. > > Yeah, we're stuck thinking that we "can't" map the aperture because > it's too large, but with a 64-bit kernel, we should be able to keep it > mapped permanently. > > Of course, the io_reserve_pci_resource and io_map_atomic functions > could do precisely that, as kmap_atomic does on non-HIGHMEM systems > today.
okay, so basically what we need is a shared API that does per page kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the impression that you were suggesting to extend kmap_atomic() to 64-bit - which would be wrong. So, in terms of the 4 APIs you suggest: struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev, int bar, int prot); void io_mapping_free(struct io_mapping *mapping); void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn); void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn); here is what we'd do on 64-bit: - io_reserve_pci_resource() would just do an ioremap(), and would save the ioremap-ed memory into struct io_mapping. - io_mapping_free() does the iounmap() - io_map_atomic(): just arithmetics, returns mapping->base + pfn - no TLB activities at all. - io_unmap_atomic(): NOP. it's as fast as it gets: zero overhead in essence. Note that it's also shared between all CPUs and there's no aliasing trouble. And we could make it even faster: if you think we could even use 2MB TLBs for the _linear_ ioremap()s here, hm? There's plenty of address space on 64-bit so we can align to 2MB just fine - and aperture sizes are 2MB sized anyway. Or we could go one step further and install these aperture mappings into the _kernel linear_ address space. That would be even faster, because we'd have a constant offset. We have the (2MB mappings aware) mechanism for that already. (Yinghai Cc:-ed - he did a lot of great work to generalize this area.) (In fact if we installed it into the linear kernel address space, and if the aperture is 1GB aligned, we will automatically use gbpages for it. Were Intel to support gbpages in the future ;-) the _real_ remapping in a graphics aperture happens on the GPU level anyway, you manage an in-RAM GPU pagetable that just works like an IOMMU, correct? on 32-bit we'd have what you use in the GEM code today: - io_reserve_pci_resource(): a NOP in essence - io_mapping_free(): a NOP - io_map_atomic(): does a kmap_atomic(pfn) - io_unmap_atomic(): does a kunmap_atomic(pfn) so on 32-bit we have the INVLPG TLB overhead and preemption restrictions - but we knew that. We'd have to allow atomic_kmap() on non-highmem as well but that's fair. Mind sending patches for this? :-) Ingo ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel