Re: ttm bo interface..

Thomas Hellström Sat, 05 Apr 2008 01:12:37 -0700

Jesse Barnes wrote:
> On Friday, April 04, 2008 11:14 am Thomas Hellström wrote:
>   
>> Dave Airlie wrote:
>>     
>>> I'm just wondering if rather than specify all the CACHED and MAPPABLE and
>>> SHAREABLE flags we make the BO interface in terms of CPU and GPU
>>> operations..
>>>
>>> So we define
>>> CPU_READ  - cpu needs to read from this buffer
>>> CPU_WRITE - cpu need to write to the buffer
>>> CPU_POOL  - cpu wants to use the buffer for suballocs
>>>
>>> GPU_READ  - gpu reads
>>> GPU_WRITE - gpu writes
>>> (GPU_EXEC??) - batchbuffers? (maybe buffers that need relocs.. not sure)
>>>
>>> We can then let the drivers internally decide what types of buffer to use
>>> and not expose the flags mess to userspace.
>>>
>>> Dave.
>>>       
>> This might be a good idea for most situations. However, there are
>> situations where the user-space drivers need to provide more info as to
>> what the buffers are used for.
>>
>> Cache coherent buffers is an excellent way to transfer data from GPU to
>> CPU, but they are usually very slow to render from. How would you tell
>> DRM that you want a cache-coherent buffer for download-from-screen type
>> of operations?
>>     
>
> They also can't be used in many cases, right?  Which would mean something 
> like 
> a batchbuffer allocation would need CPU_READ|CPU_WRITE|GPU_READ|GPU_EXEC, 
> which would have to be a WC mapping, but the driver wouldn't know just from 
> the flags what type of mapping to create.  So yeah, I think we need some 
> notion of usage or at least a bit more granularity in the type passed down.
>   
I think cache-coherent memory has gotten a bad reputation because of its 
limitations with the Intel chipsets. The general rule is probably that 
it works for mostly everything, but GPU access is substantially slower 
(say 60%) of WC memory.


I'll try to list some of the considerations that led to the current 
interface, that tend to be forgotten because people are mostly dealing 
with the Intel i915 type chipsets which are quite straighforward and 
simple in this area. If we can come up with a simpler interface for 
these, That'd be really good.

/* GPU access mode, Can be used for protection and dirty considerations */
GPU_READ
GPU_WRITE

/* Early release of vertex buffers and batch buffers in a scene that 
needs a final flush, or buffers with non-standard signaling of GPU 
completion (driver-dependant)*/
INTEL_EXE
PSB_BINNER
PSB_RASTERIZER
PSB_QUERY
VIA_MPEG
VIA_VIDEO

/* Memory types. Due to different base registers and engine 
requirements, the User-space driver
generally needs to be able to specify different memory types. This might 
not be needed with Intel chipsets, but other UMA chipsets have a number 
of restrictions for buffer placements for different parts of the GPU. 
Textures, depth buffers, mpeg buffers, shader buffers etc, but it might 
be that these can be replaced with the above driver-dependant flags.
TT
VRAM
LOCAL
PSB_SHADER
driver dependant....

/* CPU access to the buffer. */
CPU_READ
CPU_WRITE
CPU_COHERENT
CPU_POOL
(Other GL usage hints?)

> Maybe it's instructive to take a look at the way Linux does DMA mapping for 
> drivers?  The basic concepts are coherent buffers, one time buffers, and 
> device<->CPU ownership transfer.  In the graphics case though, coherent 
> mappings aren't *generally* possible (at least not yet), so we're reduced to 
> doing non-coherent mappings and transferring ownership back & forth, or just 
> keeping the mappings uncached on the CPU side in order to keep things 
> consistent.
>
> Even that's not expressive enough for what we want though.  For small 
> objects, 
> mapping into CPU space cached, then flushing out to the CPU may be much more 
> expensive than just copying the data from a cacheable CPU buffer to a WC GTT 
> page.  But with large objects taking an existing CPU mapping, switching it to 
> uncached and mapping its pages directly into the GTT is probably a big win 
> (better yet, never map it into the CPU address space as cached at all to 
> avoid all the flushing overhead).
>
>   
I agree completely. 
>> Please take a look at i915tex (mesa i915tex_branch)
>> intel_buffer_objects.c, the function intel_bufferobj_select() that
>> translates the GL target + usage hints to a subset of the flags
>> available. My opinion is that we need to be able to keep this
>> functionality.
>>     
>
> It looks like that code is #if 0'd, but I like the idea that the various 
> types 
> are broken down into what type of memory will work best, and it definitely 
> clarifies my understanding of the flags a bit.  Of course, some man pages for 
> the libdrm drmBO* calls would be even better. :)
>
>   
I haven't really done a thorough testing of it all so it's ifdef'd out 
for now, but it should show the general idea. And yes, I'd love to do 
some documentation when I get some time over.

> I think part of what we're running into here is platform specific.  There's 
> already a big divide between what might be necessary for pure UMA 
> architectures vs. ones with lots of fast VRAM, and there are also the highly 
> platform specific cacheability concerns for integrated devices on Intel.  I 
> just wonder if a general purpose memory manager is ever going to be "optimal" 
> for a given platform...  At SGI at least there tended to be new memory 
> managers for each new architecture, without much sharing that I'm aware of...
>
> Anyway hopefully we can get this sorted out soon so we can push it all 
> upstream along with the kernel mode setting work which depends on it.  I 
> think everyone's agreed that we want an API & architecture that's easy to 
> understand for both users and developers; we must be getting close to that by 
> now. :)
>   
Yes. And if we're doing a final pass at this, we should probably try to 
list most of the upcoming use-cases for the chipsets that we know 
something about today, sort most of them into the driver-dependent area 
and nail the rest as the "permanent" interface. But let's have an 
interface that, perhaps by means of driver-dependant flags, allows dirty 
performance optimizations when needed and motivated.
> Thanks,
> Jesse
>   

Thomas




-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Register now and save $200. Hurry, offer ends at 11:59 p.m., 
Monday, April 7! Use priority code J8TLD2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: ttm bo interface..

Reply via email to