Dave,

The big problem with the (second) radeon approach of state objects was
that we defined those objects statically & encoded them into the kernel
interface.  That meant that when new hardware functionality was needed
(or discovered) we had to rev the kernel interface, usually in a fairly
ugly way.

I think Jerome's approach could be a good improvement if the state
objects it creates are defined by software at runtime, more like little
display lists than pre-defined state atoms.  The danger again is that
you run into cases where you need to expand objects the verifier will
allow userspace to create, but at least in doing so you won't be
breaking existing users of the interface.

I think the key is that there should be no pre-defined format for these
state objects, simply that they should be a sequence of legal
commands/register writes that the kernel validates once and userspace
can execute multiple times.

Keith


On Sat, 2009-08-08 at 05:43 -0700, Dave Airlie wrote:
> On Sat, Aug 8, 2009 at 7:51 AM, Jerome Glisse<gli...@freedesktop.org> wrote:
> > Investigating where time is spent in radeon/kms world when doing
> > rendering leaded me to question the design of CS ioctl. As i am among
> > the people behind it, i think i should give some historical background
> > on the choice that were made.
> 
> I think this sounds quite like the original radeon interface or maybe
> even a bit like the second one. The original one stored the registers
> in the sarea, and updated the context under the lock, and had the
> kernel emit it. The sceond one had a bunch of state objects, containing
> ranges of registers that were safe to emit.
> 
> Maybe Keith Whitwell can point out why these were a good/bad idea,
> not sure if anyone else remembers that far back.
> 
> Dave.
> 
> >
> > The first motivation behind cs ioctl was to take common language
> > btw userspace and kernel and btw kernel and device. Of course in
> > an ideal world command submitted through cs ioctl could directly
> > be forwarded to the GPU without much overhead. Thing is, the world
> > we leave in isn't that good. There is 2 things the cs ioctl
> > do before forwarding command:
> >
> > 1- First it must rewrite any packet which supply an offset to GPU
> > with the address the memory manager validate the buffer object
> > associated to this packet. We can't get rid of this with the cs
> > ioctl (we might do somethings very clever like doing a new
> > microcode for the cp so that cp can rewrite packet using some
> > table of validated buffer offset but i am not even sure cp
> > would be powerful enough to do that).
> > 2- In order to provide a more advanced security than what we
> > did have in the past i added a cs checker facility which is
> > responsible to analyze the command stream and make sure that
> > the GPU won't read or write outside the supplied buffer object
> > list. DRI1 didn't offered such advanced checking. This feature
> > was added with GPU sharing in mind where sensible application
> > might run on the GPU and for which we might like to protect
> > their memory.
> >
> > We can obviously avoid the second item and things would work
> > but userspace would be able to abuse the GPU to access outside
> > the GPU object its own (this doesn't means it will be able to
> > access any system ram but rather any ram that is mapped to GPU
> > which should for the time being only be pixmap, texture, vbo
> > or things like that).
> >
> > Bottom line is that with cs ioctl we do 2 times a different
> > work. In userspace we build a command stream under stable by the
> > GPU and in kernel space we unencode this command stream to check
> > it. Obviously this sounds wrong.
> >
> > That being said, CS ioctl isn't that bad, it doesn't consume much
> > on benchmark i have done but i expect it might consume a more on
> > older cpu or when many complex 3D apps run at the same time. So
> > i am not proposing to trash it away but rather to discuss about
> > a better interface we could add at latter point to slowly replace
> > cs. CS is bringing today feature we needed yesterday so we should
> > focus our effort on getting cs ioctl as smooth and good as possible.
> >
> >
> > So as a pet project i have been thinking this last few days of
> > what would be a better interface btw userspace and kernel and
> > i come up with somethings in btw gallium state object and nvidia
> > gpu object (well at least as far as i know each of this my
> > design sounds close to that).
> >
> > Idea behind design is that whenever userspace allocate a bo,
> > userspace knows about properties of the bo. If it's a texture
> > userspace knows the size, the number of mipmap level, the
> > border,... of the textur. If it's a vbo it's knows the layout
> > the size, number of elements, ... same for rendering viewport
> > it knows the size and associated properties
> >
> > Design 2 ioctl:
> >        create_object :
> >                supply :
> >                        - object type id specific to asic
> >                        - object structure associated to type
> >                        id, fully describing the object
> >                return :
> >                        - object id
> >                processing :
> >                        - check that the state provided are
> >                        correct and check that the bo is big
> >                        enough for the state
> >                        - translate state into packet stream
> >                        - store the object and packet stream
> >                        & associated object id
> >        batchs :
> >                supply :
> >                        - table of batch
> >                process :
> >                        - check each batch and schedule them
> >
> > Each batch is a set of object id and userspace need to provide
> > all object id for the batch to be valid. For instance if shader
> > object id needs 5 texture, batch needs to have 5 texture object
> > id supplied.
> >
> > Checking that a batch is valid is quick as it's a set of
> > already checked object. You create object just after creating
> > the bo (if it's a pixmap you can create a texture and viewport
> > just after and whenever you want to use this pixmap just use
> > the proper object id). This means that for object which are
> > used multiple times you do object properties checking once and
> > then takes advantage of quick reuse.
> >
> > Example of what object looks like is at:
> > http://people.freedesktop.org/~glisse/rv515obj.h
> >
> > So what we win is fast checking, better knowledge in the kernel
> > of a use of a bo, all this allow to add many optimization :
> >        - simple state remission optimization (don't remit state
> >        of an object if the object state are already set in the
> >        GPU)
> >        - clever flushing if a bo is only associated to texture
> >        object than kernel knows that it's not necessary to ask
> >        for GPU flush and can take clever flushing decission,
> >        - gives more information to kernel for object placement
> >        - kernel can override object placement even for things
> >        like vbo were endian swapping might need different setting
> >        depending on layout of vbo and where it's in memory
> >        - hw optimization like rotating texture btw available slot
> >        to avoid flushing the texture cache
> >        - faster relocation, relocation can be hardcoded so no need
> >        to parse anythings.
> >        - kernel can break down a batches ioctl, each batch is full
> >        description of the state necessary to perform an operation,
> >        so only requirement is that each batch of a batches ioctl
> >        fit into the available memory
> >        - likely easier to report a memory limit for a maximum batch
> >        size userspace can supply
> >        - share state btw process (object id could be a hash of the
> >        object states so being unique to given set of states)
> >        - easier to workaround some gpu limitations
> >        - allow to fine tune some of the gpu fifo safely in the
> >        kernel
> >
> > Drawbacks i see (often with a new design you don't see all the
> > drawbacks so please add any) :
> >        - kernel needs to know how to build the command stream
> >        (mostly byte shifting & masking)
> >        - might consume more memory (especially if userspace keeps
> >        a copy of the state)
> >        - you loose some of the benefit if you cohabit with cs
> >        ioctl (need to assume all state are loose after cs and
> >        perform all necessary flush like texture state flush)
> >        - adding feature is going through kernel (not sure it's a
> >        drawback it gives us a short window for merging new features
> >        which leads to likely bigger amount of time sitting on top
> >        of new features and testing them).
> >
> > Well this mail is already big enough, so what i would like is feedback
> > on this, does it sounds like a good direction ? (So far for me it sounds
> > better but i can be wrong).
> >
> > Cheers,
> > Jerome Glisse
> >
> >
> > ------------------------------------------------------------------------------
> > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> > trial. Simplify your report design, integration and deployment - and focus 
> > on
> > what you do best, core application coding. Discover what's new with
> > Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> > --
> > _______________________________________________
> > Dri-devel mailing list
> > Dri-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/dri-devel
> >


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to