from:"Keith Whitwell"

Re: [Mesa3d-dev] [mesa] svga: Fix error: cannot take address of bit-field 'texture_target' in svga_tgsi.h

2010-01-06 Thread Keith Whitwell

This looks like my fault.

It would be nice to have the r300 and nvidia drivers building by default
(eg on linux-debug builds), even if they don't create full drivers, so
that a single build can get greater coverage.

Keith

On Wed, 2010-01-06 at 09:09 -0800, Sedat Dilek wrote:
 OK.
 
 That's the next one :-)
 ...
 In file included from r300_emit.c:36:
 r300_state_inlines.h: In function ‘r300_translate_tex_filters’:
 r300_state_inlines.h:263: error: ‘is_anisotropic’ undeclared (first
 use in this function)
 r300_state_inlines.h:263: error: (Each undeclared identifier is
 reported only once
 r300_state_inlines.h:263: error: for each function it appears in.)
 make: *** [r300_emit.o] Error 1
 ...
 
 I am having dinner, now
 
 - Sedat -
 
 On Wed, Jan 6, 2010 at 6:07 PM, Brian Paul bri...@vmware.com wrote:
  Sedat Dilek wrote:
 
  Hi,
 
  this patch fixes a build-error in mesa GIT master after...
 
  commit  251363e8f1287b54dc7734e690daf2ae96728faf (patch)
  configs: set INTEL_LIBS, INTEL_CFLAGS, etcmaster
 
  From my build-log:
 
  ...
  In file included from svga_pipe_fs.c:37:
  svga_tgsi.h: In function 'svga_fs_key_size':
  svga_tgsi.h:122: error: cannot take address of bit-field 'texture_target'
  make[4]: *** [svga_pipe_fs.o] Error 1
 
  Might be introduced in...
 
  commit  955f51270bb60ad77dba049799587dc7c0fb4dda
  Make sure we use only signed/unsigned ints with bitfields.
 
  Kind Regars,
  - Sedat -
 
 
  I just fixed that.
 
  -Brian
 
 
 --
 This SF.Net email is sponsored by the Verizon Developer Community
 Take advantage of Verizon's best-in-class app development support
 A streamlined, 14 day to market process makes app distribution fast and easy
 Join now and get one step closer to millions of Verizon customers
 http://p.sf.net/sfu/verizon-dev2dev 
 ___
 Mesa3d-dev mailing list
 mesa3d-...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/mesa3d-dev



--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [Patch VIA UniChrome DRM][2/5 Ver1] Add support for video command flush and interface for V4L kernel module

2009-10-08 Thread Keith Whitwell

On Thu, 2009-10-08 at 02:35 -0700, brucech...@via.com.tw wrote:
 Hello Thomas:
 
  If I understand the code correctly, the user-space application prepares 
  command buffers directly in AGP, and asks the
  drm module to submit them. We can't allow this for security reasons. The 
  user-space application could for example fill the buffer with 
  commands to texture from arbitrary system memory, getting hold of other 
  user's private data.
  The whole ring-buffer stuff and the command verifier was once 
  implemented to fix that security problem.

 Thank you very much for your comment. What if we do a security
  check in these buffer before submit? Let me check if there is any way
  to work around for this security issue.


Who would do that security check?  Userspace?  That doesn't work as
userspace is not trusted.

The kernel?  Ok, but now it's reading commands out of a presumably
write-combined AGP buffer, which is slow.  You'd have been better off
passing the commands to the kernel in regular memory, which is
presumably exactly what the existing mechanism does.

Keith



--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: preventing GPU reset DoS

2009-09-22 Thread Keith Whitwell

On Tue, 2009-09-22 at 12:13 -0700, Pauli Nieminen wrote:
 Hi!
 
 I have been thinking GPU reset as possible DoS attack from
 user-space.Problem here is that display doesn't work anymore at all if
 attacker chooses to run a application that constantly causes GPU hang.
 It would be of course ideal to have CS checker not to let in any
 problematic combinations of commands. Butin practice we can't assume
 that everything is safe with all hardware so we need to take some
 actions prevent possible problems.
 
 So first defense would be terminating application that did send
 command stream that caused GPU hang. But attacker could easily by-pass
 this protection with forking all the time new processes.
 
 So we need stronger defense if same user account is causing multiple
 hangs in short time frame. I would think temporary denying new DRI
 access would let user to gain back control of system and take actions
 to stop the problematic program from running.

OK, but you'd want to be able to turn it off for developers -- you've
just described my normal workflow...

Keith


--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH 6/6] [drm/i915] implement drmmode overlay support v2

2009-09-01 Thread Keith Whitwell

On Tue, 2009-09-01 at 02:20 -0700, Thomas Hellström wrote:
 Stephane Marchesin wrote:
  2009/8/31 Thomas Hellström tho...@shipmail.org:
 

  The problem I see with Xv-API-in-kernel is that of the various hw
  constrains on the buffer layout. IMHO this has two solutions:
 
  a) complicated to communicate the constrains to userspace. This is either
  to generic or not suitable for everything.
 

  IIRC Xv exposes this all the way down to the user-app, as format and
  then offset into buffer + stride for each plane?
  
 
  Well, for example if your overlay can only do YUY16 in hardware, you
  still might want to expose YV12/I420 through Xv and do internal
  conversion. So you'd have to add format conversion somewhere in the
  stack (probably in user space though). The same happens for swapped
  components and planar/interlaced; does your hw do YV12, I420, NV12 or
  something else ?
 

 The hw does YV12,  YUY2 and UYVY.
 
 Since the user of this interface (the Xorg state tracker) is generic, 
 there's really no point (for us) to
 have driver-specific interfaces that exposes every format that the 
 hardware can do. The
 situation might be different, I guess, for device-specific Xorg drivers. 
 If we're doing this I think
 we should expose perhaps a reasonable small number of common formats, 
 and if the hardware doesn't support any
 of them, the hardware is not going to be supported.
 
 That might unfortunately lead to having driver-specific interfaces for 
 the device-specific Xorg driver and a
 generic interface for the Xorg state tracker, and I'm not sure people 
 like that idea?

I'm coming to this late, but if the only difference between hw-specific
and hw-independent interfaces is which formats are supported, that
surely shouldn't be too hard to abstract?  Just have an enum which gets
expanded with new format names and query for supported formats in the
API.

Keith


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell

I think the bug in question was because somebody (Jon Smirl??) removed the 
empty  apparently unused poll implementation from the drm fd, only to discover 
that the X server was actually polling the fd.

If this code adds to, extends or at least doesn't remove the ability to poll 
the drm fd, it should be fine.

Keith

From: Kristian Høgsberg [...@bitplanet.net]
Sent: Tuesday, August 18, 2009 8:31 AM
To: Thomas Hellström
Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm   
event


 That can't be the real problem.  The X server polls on a ton of file
 descriptors already; sockets from clients, dbus, input devices.  They
 all have poll implementations that don't return 0... I mean, otherwise
 they wouldn't work.  Look at evdev_poll() in drivers/input/evdev.c for
 the evdev poll implementation, for example.


 You're probably right, but we should probably find out what went wrong and
 make sure it doesn't happen again with non-modesetting drivers + dri1 before
 pushing this.

I really don't think that's necessary.  As I wrote in my reply to
Dave, there's nothing in this patch that can cause select(2) to return
EINVAL that isn't already present in other poll fops implementations.
Like the evdev one, which we already select on - please compare that
function with the poll implementation in my patch and tell me why the
drm poll is cause for concern.  I need a better, more specific reason
why this is such a risk and why I should spend more time tracking this
stuff down.  And if select(2), for whatever reason, returns EINVAL
because of the drm_poll() fops implementation, that's a bug in the
kernel that needs to be fixed.

cheers,
Kristian

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell

This seems wrong to me -- the client doesn't need to sleep - all it's going to 
do is build a command buffer targeting the new backbuffer.  There's no problem 
with that, it should be the preserve of the GPU scheduler (TTM or GEM) to 
ensure those commands (once submitted) don't get executed until the buffer is 
available - otherwise you're potentially pausing your application for no good 
reason.  The app should be throttled if it gets more than a couple of frames 
ahead, but there should be 100% overlap with hardware otherwise.

If you need  a solution that doesn't rely on the buffer manager, perhaps resort 
to triple-buffering, or else create a new buffer and return that in 
DRI2GetBuffers (and let the scanout one be freed once the flip is done). 

It seems like arbitrating command execution against on-hardware buffers should 
be the preserve of the kernel memory manager  other actors shouldn't be 
double-guessing that.

Keith

From: Kristian Høgsberg [...@bitplanet.net]
Sent: Tuesday, August 18, 2009 11:46 AM
To: Thomas Hellström
Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm   
event

We don't put clients to sleep until they try to render to the new
backbuffer.  For direct rendering this happens when the client calls
DRI2GetBuffers() after having called DRI2SwapBuffers().  If the flip
is not yet finished at that time, we restart the X request and suspend
the client.  When the drm event fires it is read by the ddx driver,
which then calls DRI2SwapComplete() which will wake the client up
again.  For AIGLX, we suspend the client in __glXForceCurrent(), but
the wakeup happens the same way.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

2009-08-18 Thread Keith Whitwell

No, I'm fine.  I don't have the patch in front of me but it doesn't sound like 
it precludes these types of changes in the future.

Keith

From: Jesse Barnes [jbar...@virtuousgeek.org]
Sent: Tuesday, August 18, 2009 1:23 PM
To: Keith Whitwell
Cc: Kristian Høgsberg; Thomas Hellström; Kristian Høgsberg; 
dri-de...@lists.sf.net
Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event

Anyway, to me this discussion is more of a future directions one than
a blocker for this particular patchset.  AFAICT the only thing that
needs fixing with this patch is my lock confusion (struct_mutex vs
mode_config).  Or would you like something substantial changed with
these bits before they land?

--
Jesse Barnes, Intel Open Source Technology Center

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Doing better than CS ioctl ?

2009-08-12 Thread Keith Whitwell


Dave,

The big problem with the (second) radeon approach of state objects was
that we defined those objects statically  encoded them into the kernel
interface.  That meant that when new hardware functionality was needed
(or discovered) we had to rev the kernel interface, usually in a fairly
ugly way.

I think Jerome's approach could be a good improvement if the state
objects it creates are defined by software at runtime, more like little
display lists than pre-defined state atoms.  The danger again is that
you run into cases where you need to expand objects the verifier will
allow userspace to create, but at least in doing so you won't be
breaking existing users of the interface.

I think the key is that there should be no pre-defined format for these
state objects, simply that they should be a sequence of legal
commands/register writes that the kernel validates once and userspace
can execute multiple times.

Keith


On Sat, 2009-08-08 at 05:43 -0700, Dave Airlie wrote:
 On Sat, Aug 8, 2009 at 7:51 AM, Jerome Glissegli...@freedesktop.org wrote:
  Investigating where time is spent in radeon/kms world when doing
  rendering leaded me to question the design of CS ioctl. As i am among
  the people behind it, i think i should give some historical background
  on the choice that were made.
 
 I think this sounds quite like the original radeon interface or maybe
 even a bit like the second one. The original one stored the registers
 in the sarea, and updated the context under the lock, and had the
 kernel emit it. The sceond one had a bunch of state objects, containing
 ranges of registers that were safe to emit.
 
 Maybe Keith Whitwell can point out why these were a good/bad idea,
 not sure if anyone else remembers that far back.
 
 Dave.
 
 
  The first motivation behind cs ioctl was to take common language
  btw userspace and kernel and btw kernel and device. Of course in
  an ideal world command submitted through cs ioctl could directly
  be forwarded to the GPU without much overhead. Thing is, the world
  we leave in isn't that good. There is 2 things the cs ioctl
  do before forwarding command:
 
  1- First it must rewrite any packet which supply an offset to GPU
  with the address the memory manager validate the buffer object
  associated to this packet. We can't get rid of this with the cs
  ioctl (we might do somethings very clever like doing a new
  microcode for the cp so that cp can rewrite packet using some
  table of validated buffer offset but i am not even sure cp
  would be powerful enough to do that).
  2- In order to provide a more advanced security than what we
  did have in the past i added a cs checker facility which is
  responsible to analyze the command stream and make sure that
  the GPU won't read or write outside the supplied buffer object
  list. DRI1 didn't offered such advanced checking. This feature
  was added with GPU sharing in mind where sensible application
  might run on the GPU and for which we might like to protect
  their memory.
 
  We can obviously avoid the second item and things would work
  but userspace would be able to abuse the GPU to access outside
  the GPU object its own (this doesn't means it will be able to
  access any system ram but rather any ram that is mapped to GPU
  which should for the time being only be pixmap, texture, vbo
  or things like that).
 
  Bottom line is that with cs ioctl we do 2 times a different
  work. In userspace we build a command stream under stable by the
  GPU and in kernel space we unencode this command stream to check
  it. Obviously this sounds wrong.
 
  That being said, CS ioctl isn't that bad, it doesn't consume much
  on benchmark i have done but i expect it might consume a more on
  older cpu or when many complex 3D apps run at the same time. So
  i am not proposing to trash it away but rather to discuss about
  a better interface we could add at latter point to slowly replace
  cs. CS is bringing today feature we needed yesterday so we should
  focus our effort on getting cs ioctl as smooth and good as possible.
 
 
  So as a pet project i have been thinking this last few days of
  what would be a better interface btw userspace and kernel and
  i come up with somethings in btw gallium state object and nvidia
  gpu object (well at least as far as i know each of this my
  design sounds close to that).
 
  Idea behind design is that whenever userspace allocate a bo,
  userspace knows about properties of the bo. If it's a texture
  userspace knows the size, the number of mipmap level, the
  border,... of the textur. If it's a vbo it's knows the layout
  the size, number of elements, ... same for rendering viewport
  it knows the size and associated properties
 
  Design 2 ioctl:
 create_object :
 supply :
 - object type id specific to asic
 - object structure associated to type
 id, fully describing the object

Re: DRM development process wiki page..

2008-08-28 Thread Keith Whitwell

 Major bumps once stuff went into the kernel weren't allowed at all.
 You'd need to fork the driver in any case. So we did this once or
 twice on drivers in devel trees like mach64.
 However upstream first policy should avoid this need. I'd also prefer
 to see getparam for new features instead of version checks. The linear
 version check sucks.

This is an interesting concept that opens up some ideas for dealing
with feature deprecation, etc.

Think about opengl's extension mechanism -- features can be exposed
through that mechanism without ever providing a guarantee of future
availability -- in fact there is no guarantee of any availability
outside the current session.  Future versions of a GL driver might add
or remove extensions as desired, within the constraints of the GL
version number advertised.

What we could see is something similar for the DRM interface -- a base
level of functionality specified by the Major/Minor numbers, but
additional extensions that may be advertised according to the whim of
the kernel module that the driver can take advantage of if present,
but which it must otherwise function correctly without...

Extensions that don't work out can be dropped, those that do can be
incorporated into the next increment of the minor number, a la GL1.5

Keith

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [RFC: 2.6 patch] remove the i830 driver

2008-07-15 Thread Keith Whitwell

On Tue, Jul 15, 2008 at 3:35 PM, Simon Farnsworth
[EMAIL PROTECTED] wrote:
 Keith Whitwell wrote:

 You can still buy new i865 boards:

 http://www.ebuyer.com/product/119412

 So I think this isn't a great idea.

 This won't remove all support for i865. It only removes support for the
 combination of i865 and old X servers.

 New X servers use the i915 driver to support the i865 chipset.

You're right -- removing the old module is fine by me.

Keith

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Combining Mesa3D and DRI mailing lists and/or sites? (was: Re: Wrapping up 7.4 (finally))

2008-06-21 Thread Keith Whitwell

On Mon, Jun 16, 2008 at 8:31 AM, Timo Jyrinki [EMAIL PROTECTED] wrote:
 2008/6/12 Keith Whitwell [EMAIL PROTECTED]:
 In reality, what has happened is that most of this has already
 occurred -- whatever 3d driver-related traffic that hasn't been sucked
 into IRC is now occurring on the Mesa lists.

 Right. I now rearranged DRI wiki's mailing list page
 http://dri.freedesktop.org/wiki/MailingLists by stating that fact. I
 also commented out the dri-announce mailing list which hadn't been
 used for 5+ years.

 I actually think the current structure makes a lot of sense - if we
 wanted a change, we could rename dri-devel to drm-devel, but it hardly
 seems worthwhile.

 It'd be nice, but only if somehow automagic enough. Just documentation
 is mostly enough, too.

 What about the dri-users mailing list? From users point of view
 DRI/Mesa/DRM are mostly all the same (users want them all), and any
 users of DRM are likely to be halfway developers anyway. While DRI
 discussion has successfully migrated to mesa3d-dev list, users are
 currently randomly posting either mesa3d-users or dri-users and the
 discussion is not coherent. Could those two mailing lists be merged
 into mesa3d-users, or do you think that mentioning dri-users is
 (nowadays) for DRM discussion is enough to fix the problem from now
 on?

I think dri-users is certainly redundant now, likewise -announce.  If
those could somehow get funneled into mesa-users or an appropriate
Xorg list, that would be fine with me...

 Regarding wikis, I also started reorganizing the front page
 http://dri.freedesktop.org/ a bit, including changing title to include
 Mesa, too. I still think that it could be the wiki for both Mesa and
 DRI, and that mesa3d.org could include a link to the wiki (or DRI
 wiki because of the current status) under eg. the Resources title
 instead of having the link to DRI website only in the bottom of the
 navigation. What do you think?

I'm also ok with this general concept.

Keith

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: GEM merging to master

2008-06-13 Thread Keith Whitwell

 If this was a test of just two memory manager implementations, the
 benchmarks would speak for themselves.  However, there are at least two
 driver changes I caught on first review of gallium-i915-current's
 i915simple (which I assume is what you were testing, given that the last
 tests I've heard from you guys were using that) that would have an
 impact on performance:

As far as I know the gallium driver isn't involved in these tests --
this is a comparison between the original i915tex and newer versions
of the driver.

i915simple is missing support for tiling at this stage (private
backbuffers) so any performance results on that driver are unlikely to
be meaningful yet.

Keith

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Combining Mesa3D and DRI mailing lists and/or sites? (was: Re: Wrapping up 7.4 (finally))

2008-06-12 Thread Keith Whitwell

On Thu, Jun 12, 2008 at 5:28 PM, Timo Jyrinki [EMAIL PROTECTED] wrote:
 2008/6/12 Daniel Stone [EMAIL PROTECTED]:
 On Thu, Jun 12, 2008 at 10:49:57AM +0300, Timo Jyrinki wrote:
 Speaking of which, if you have any ideas how to better interlink and 
 combine:
 - http://dri.freedesktop.org/
 - http://xorg.freedesktop.org/
 - http://mesa3d.org/
 ...
 I don't understand why DRI and Mesa have separate lists and websites,
 tbh, especially given the level of crosstalk.  For the wikis, it should
 be possible to link between them, and I'll try to sort out how to make
 that happen.

 Hi. Would it be any beneficial to (either, both or neither):

 1. Combine mailing lists as follows:
 - mesa3d-dev  dri-devel
 - mesa3d-users  dri-users
 - mesa3d-announce  dri-announce
 - mesa-commit  dri-patches

 There is probably historical reasons for the separation, but are there
 any current ones that would be more important than the benefits for
 single point of discussion about Mesa/DRI which overlap so much anyway
 (especially in users' perspective, but also development-wise)?

 At the same time, they might be moved to freedesktop.org from sourceforge.net?

 2. Make DRI's wiki into combined Mesa3D's and DRI's wiki. Mesa3D does
 not currently have a wiki of its own, but DRI has. Mesa3D certainly
 doesn't need yet another wiki in addition to X.org wiki and DRI wiki,
 so why not make it a common one officially?

 I think Mesa3D's current web site is quite nicely organized, and could
 be evolved from that by integrating a bit more DRI stuff and the new
 (currently DRI) wiki into it. Certainly not throwing away it and
 replacing with a wiki, a wiki would take a very big effort to make it
 as navigable and organized as the Mesa3D homepage currently is.

 If either sounds reasonable, is it acceptable for DRI as a project to
 be generally known (as it already mostly is known, I think) a
 sub-project of Mesa3D, so the combined name would be simply Mesa3D? Or
 is there a need for clearer separation between the two? Mainly
 important from the perspective of naming of the mailing lists, ie. can
 they be mesa3d-devel or something else.


In reality, what has happened is that most of this has already
occurred -- whatever 3d driver-related traffic that hasn't been sucked
into IRC is now occurring on the Mesa lists.

The DRI list has in effect become the list for development of the drm
kernel module, libdrm, and the various memory manager implementations.
 While Mesa is an important client of these, it is far from being the
only client.

I actually think the current structure makes a lot of sense - if we
wanted a change, we could rename dri-devel to drm-devel, but it hardly
seems worthwhile.  Another proposal would be to merge the DRI lists
into LKML...  I don't really want to do that either...

Keith

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915 performance, master, i915tex gem

2008-05-20 Thread Keith Whitwell

 So possibilities are:
  - batchbuffer starvation -- has

I was going to say 'has this changed significantly' -- and the answer
is that it has of course, with the bufmgr_fake changes...  I can't
tell by quick inspection if these are a likely culprit, but it's
certainly a signifcant set of changes relative to the classic version
of classic...

  - over-throttling in swapbuffers -- I think we used to let it get
 two frames ahead - has this changed?
  - something else...

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915 performance, master, i915tex gem

2008-05-20 Thread Keith Whitwell

   * Classic is apparently doing suboptimal syncs that limits its
 performance in some cases (gears, teapot and perhaps openarena),
 one should not benchmark framerates against classic in those cases.

As I said elsewhere, I'd like to get to the bottom of this -- it
wasn't always this way.  Otherwise we should abandon 'classic' off the
trunk and use one of the ye olde 7.0 versions.

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: GEM discussion questions

2008-05-20 Thread Keith Whitwell

On Tue, May 20, 2008 at 1:29 PM, Thomas Hellström
[EMAIL PROTECTED] wrote:
 Keith Packard wrote:
 On Mon, 2008-05-19 at 12:13 -0700, Ian Romanick wrote:

 The obvious overhead I was referring to is the extra malloc / free.
 That's why I went on to say So, now I have to go back and spend time
 caching the buffer allocations and doing other things to make it fast.
 ~ In that context, I is idr as an app developer. :)


 You'd be wrong then -- the cost of the malloc/write/copy/free is cheaper
 than the cost of map/write/unmap.


 One problem that we have here is that none of the benchmarks currently
 being used hit any of these paths.  OpenArena, Enemy Territory (I assume
 this is the older Quake 3 engine game), and gears don't use MapBuffer at
 all.  Unfortunately, any apps that would hit these paths are so
 fill-rate bound on i965 that they're useless for measuring CPU overhead.


 The only place we see significant map/write/unmap vs
 malloc/write/copy/free is with batch buffers, and so far the
 measurements that I've taken which appear to show a benefit haven't been
 reproduced by others...


 We could certainly use texdown to test this out, if the GEM i915
 driver implemented a pwrite-enabled
 struct dd_function_table::TextureMemCpy()

Double-copy texture uploads have been 'tested' in the past -- and
their poor performance was one of the motivating factors for creating
a single-copy scheme.

The double-copy upload path isn't *that* bad, as long as the entire
texture fits into cache...  As soon as it exceeds the cache
dimensions, it falls off a cliff.

FWIW, Intel are making some cpus with pretty small caches these days,
and teaming them up with i945 gpus, so this isn't completely
theoretical.

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell

- Original Message 
 From: Ian Romanick [EMAIL PROTECTED]
 To: DRI dri-devel@lists.sourceforge.net
 Sent: Monday, May 19, 2008 10:04:09 AM
 Subject: Re: TTM vs GEM discussion questions

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Ian Romanick wrote:

 | I've read the GEM documentation several times, and I think I have a good
 | grasp of it.  I don't have any non-trivial complaints about GEM, but I
 | do have a couple comments / observations:
 |
 | - I'm pretty sure that the read_domain = GPU, write_domain = CPU case
 | needs to be handled.  I know of at least one piece of hardware with a
 | kooky command buffer that wants to be used that way.
 |
 | - I suspect that in the (near) future we may want multiple read_domains.
 | ~ I can envision cases where applications using, for example, vertex
 | feedback mode would want to read from a buffer while the GPU is also
 | reading from the buffer.
 |
 | - I think drm_i915_gem_relocation_entry should have a size field.
 | There are a lot of cases in the current GL API (and more to come) where
 | the entire object will trivially not be used.  Clamped LOD on textures
 | is a trivial example, but others exist as well.

 Another question occurred to me.  What happens on over-commit?  Meaning,
 in order to draw 1 polygon more memory must be accessible to the GPU
 than exists.  This was a problem that I never solved in my 2004
 proposal.  At the time on R200 it was possible to have 6 maximum size
 textures active which would require more than the possible on-card + AGP
 memory.

I don't actually think the problem is solvable for buffer-based memory managers 
-- the best we can do is spot the failure and recover, either early as the 
commands are submitted by the API, or some point later, and for some meaning of 
'recover' (eg - fail cleanly, fallback, use-smaller-mipmaps, disable texturing, 
etc).

The only real way to solve it is to move to a page-based virtualizaization of 
GPU memory, which requires hardware support and isn't possible on most cards.  
Note that this is different from per-process GPU address spaces, and is a 
signficantly tougher problem even on supporting hardware.

Note there are two concepts with similar common names:

   - virtual GPU memory -- ie per-context page tables, but still a buffer-based 
memory manager, textures pre-loaded into GPU memory prior to command execution

   - virtualized GPU memory -- as above, but with page faulting, typically 
IRQ-driven with kernel assistance.  Parts of textures may be paged in/out as 
required, according to the memory access patterns of active shaders.

It's not clear to me which of the above the r300  nv people are aiming at, but 
in my opinion the latter is such a significant departure from what we have been 
thinking about that I have always believed it should be addressed by a new set 
of interfaces.

Ketih

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell

- Original Message 
 From: Thomas Hellström [EMAIL PROTECTED]
 To: Stephane Marchesin [EMAIL PROTECTED]
 Cc: DRI dri-devel@lists.sourceforge.net
 Sent: Monday, May 19, 2008 9:49:21 AM
 Subject: Re: TTM vs GEM discussion questions

 Stephane Marchesin wrote:
  On 5/18/08, Thomas Hellström wrote:

Yes, that was really my point. If the memory manager we use (whatever
it is) does not allow this kind of behaviour, that'll force all cards
to use a kernel-validated command submission model, which might not be
too fast, and more difficult to implement on such hardware.

I'm not in favor of having multiple memory managers, but if the chosen
one is both slower and more complex to support in the future, that'll
be a loss for everyone. Unless we want to have another memory manager
implementation in 2 years from now...

Stephane

  First, TTM does not enforce kernel command submission, but it forces you
   to tell the kernel about command completion status in order for the
   kernel to be able to move and delete buffers.

  Yes, emitting the moves from the kernel is not a necessity. If your
  card can do memory protection, you can setup the protection bits in
  the kernel and ask user space to do the moves. Doing so means in-order
  execution in the current context, which means that in the normal case
  rendering does not need to synchronize with fences at all.

   I'm not sure how you could avoid that with ANY kernel based memory
   manager, but I would be interested to know how you expect to solve that
   problem.

  See above, if the kernel controls the memory protection bits, it can
  pretty much enforce things on use space anyway.

 Well, the primary reason for the kernel to sync and move a buffer object 
 would be to evict it from VRAM, in which case I don't think the 
 user-space approach would be a valid solution, unless, of course, you 
 plan to use VRAM as a cache and back it all with system memory.

 Just out of interest (I think this is a valid thing to know, and I'm not 
 being TTM / GEM specific here):
 1) I've never seen a kernel round-trip per batchbuffer as a huge 
 performance-problem, and it surely simplifies things for an in-kernel 
 memory manger. Do you have any data to back this?
 2) What do the Nvidia propriety drivers do w r t this?

What I understand is that each hardware context (and there are lots of hardware 
contexts) has a ringbuffer which is mapped into the address space of the driver 
assigned that context.  The driver just inserts commands into that ringbuffer 
and the hardware itself schedules  context-switches between rings.

Then the question is how does this interact with a memory manager.  There still 
has to be some entity managing the global view of memory -- just as the kernel 
does for the regular vm system on the CPU.  A context/driver shouldn't be able 
to rewrite its own page tables, for instance.

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell

- Original Message 
 From: Dave Airlie [EMAIL PROTECTED]
 To: Ian Romanick [EMAIL PROTECTED]
 Cc: DRI dri-devel@lists.sourceforge.net
 Sent: Monday, May 19, 2008 4:38:02 AM
 Subject: Re: TTM vs GEM discussion questions

  All the good that's done us and our users.  After more than *5 years* of
  various memory manager efforts we can't support basic OpenGL 1.0 (yes,
  1.0) functionality in a performant manner (i.e., glCopyTexImage and
  friends).  We have to get over this it has to be perfect or it will
  never get in crap.  Our 3D drivers are entirely irrelevant at this point.

 Except on Intel hardware, who's relevance may or may not be relevant.

These can't do copyteximage with the in-kernel drm.  

  To say that userspace APIs cannot die once released is not a relevant
  counterpoint.  We're not talking about a userspace API for general
  application use.  This isn't futexs, sysfs, or anything that
  applications will directly depend upon.  This is an interface between a
  kernel portion of a driver and a usermode portion of a driver.  If we
  can't be allowed to change or deprecate those interfaces, we have no hope.

  Note that the closed source guys don't have this artificial handicap.

 Ian, fine you can take this up with Linus and Andrew Morton, I'm not 
 making this up just to stop you from putting 50 unsupportable memory 
 managers in the kernel. If you define any interface to userspace from the 
 kernel (ioctls, syscalls), you cannot just make it go away. The rule is 
 simple and is that if you install a distro with a kernel 2.6.x.distro, and 
 it has Mesa 7.0 drivers on it, upgrading the kernel to kernel 2.6.x+n 
 without touching userspace shouldn't break userspace ever. If we can't 
 follow this rule we can't put out code into Linus's kernel. So don't argue 
 about it, deal with it, this isn't going to change.

 and yes I've heard this crap about closed source guys, but we can't follow 
 their route and be distributed by vendors. How many vendors ship the 
 closed drivers?

  This is also a completely orthogonal issue to maintaining any particular
  driver.  Drivers are removed from the kernel just the same as they are
  removed from X.org.  Assume we upstreamed either TTM or GEM today.
  Clearly that memory manager would continue to exist as long as some
  other driver continued to depend on it.  I don't see how this is
  different from cfb or any of the other interfaces within the X server
  that we've gutted recently.

 Drivers and pieces of the kernel aren't removed like you think. I think we 
 nuked gamma (didn't have a working userspcae anymore) and ffb (it sucked 
 and couldn't  be fixed). Someone is bound to bring up OSS-ALSA, but that 
 doesn't count as ALSA had OSS emulation layer so userspace apps didn't 
 just stop working. Removing chunks of X is vastly different to removing an 
 exposed kernel userspace interface. Please talk to any IBM kernel person 
 and clarify how this stuff works. (Maybe benh could chime in...??)

  If you want to remove a piece of infrastructure, you have three choices.
  ~ If nothing uses it, you gut it.  If something uses it, you either fix
  that something to use different infrastructure (which puts you in the
  nothing uses it state) or you leave things as they are.  In spite of
  all the fussing the kernel guys do in this respect, the kernel isn't
  different in this respect from any other large, complex piece of
  infrastructure.

 So you are going to go around and fix the userspaces on machines that are 
 already deployed? how? e.g. Andrew Morton has a Fedora Core 1 install on a 
 laptop booting 2.6.x-mm kernels, when 3D stops working on that laptop we 
 get to hear about it. So yes you can redesign and move around the kernel 
 internals as much as you like, but you damn well better expose the old 
 interface and keep it working.

  managers or that we may want to have N memory managers now that will be
  gutted later.  It seems that the real problem is that the memory
  managers have been exposed as a generic, directly usable, device
  independent piece of infrastructure.  Maybe the right answer is to punt
  on the entire concept of a general memory manager.  At best we'll have
  some shared, optional use infrastructure, and all of the interfaces that
  anything in userspace can ever see are driver dependent.  That limits
  the exposure of the interfaces and lets us solve todays problems today.

  As is trivially apparent, we don't know what the best (for whatever
  definition of best we choose) answer is for a memory manager interface.
  ~ We're probably not going to know that answer in the near future.  To
  not let our users have anything until we can give them the best thing is
  an incredible disservice to them, and it makes us look silly (at best).

 Well the thing is I can't believe we don't know enough to do this in some 
 way generically, but maybe the TTM vs GEM thing proves its not possible. 

I don't

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell

- Original Message 
 From: Dave Airlie [EMAIL PROTECTED]
 To: Jerome Glisse [EMAIL PROTECTED]
 Cc: Keith Whitwell [EMAIL PROTECTED]; Ian Romanick [EMAIL PROTECTED]; DRI 
 dri-devel@lists.sourceforge.net
 Sent: Monday, May 19, 2008 12:16:57 PM
 Subject: Re: TTM vs GEM discussion questions

  For radeon the plan was to return error from superioctl as during 
  superioctl and validation i do know if there is enough gart/vram to do 
  the things. Then i think it's up to upper level to properly handle such 
  failure from superioctl

 You really want to work this out in advance, at superioctl stage it is too 
 late, have a look at the changes I made to the dri_bufmgr.c classic memory 
 manager case to deal with this for Intel hw, if you got to superioctl and 
 failed unwrapping would be a real pain in the ass, you might have a number 
 of pieces of app state you can't reconstruct. I think DirectX handled this 
 with cut-points where with the buffer you passed the kernel a set of 
 places it could break the batch without too much effort. I think we are 
 better just giving the mesa driver a limit and when it hits that limit it 
 submits the buffer. The kernel can give it a new optimal limit at any 
 point and it should use that as soon as possible. Nothing can solve Ians 
 problems where the app gives you a single working set that is too large at 
 least with current GL. However you have to deal with the fact that 
 batchbuffer has many operations and the total working set needs to fit in 
 RAM to be relocated. I've added all the hooks in dri_bufmgr.c for non-TTM 
 case, TTM shouldn't be a major effort to add.

  My understanding of future hw is that we are heading to virtualized GPU 
  memory (IRQ assistance for page fault).

 I think we'll have this for r700, not sure i965 does this, r500 has I 
 think per-process GART.

I don't think you can restart i9xx after a pagefault, may be wrong...

Note per-process GART != support for virtualized memory, though it gets you one 
step of the way.  You also need the support so the kernel can figure out what 
page needs to be swapped in, be able to restart the GPU after the pagefault, 
etc, and probably some way to have the hardware go off and do something useful 
on another context in the meantime.  

I'd like to just try and get buffer based memory management working well first, 
then draw a line under that and work on these more advanced concepts...

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell


  It's not clear to me which of the above the r300  nv people are aiming at, 
 but in my opinion the latter is such a significant departure from what we 
 have 
 been thinking about that I have always believed it should be addressed by a 
 new 
 set of interfaces.
  
 
 My understanding of future hw is that we are heading to virtualized GPU 
 memory 
 (IRQ assistance for
 page fault).

Yes, of course.  This is the vista advanced scheduler and I guess it will be 
enforced by whql or some other mandatory scheme.  Here's a post from 2006 that 
lays out the concepts:

http://blogs.msdn.com/greg_schechter/archive/2006/04/02/566767.aspx

The graphics rumour sites suggest that one or more of the IHVs failed to 
achieve this for the vista deadlines, so it might be a bit of a tough technical 
problem...

My belief is that there are two different problems - buffer based memory 
managent and page-based virtualized GPU memory, and they should be solved with 
different implementations and probably different interfaces.  Moreover, we 
should try and get a workable buffer-based scheme for current hardware and then 
commence navel-gazing to support future cards... delaying an adequate 
buffer-based memory manager (ttm+cleaner-interface or gem+performance-fixes) to 
wait for a page-based one doesn't make any sense as the page-based one won't 
ever work on current cards.

The opposite is true, however -- a decent set of buffer-based interfaces will 
keep working for a long time, giving breathing room to create a page-baed 
manager later.

Keith


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

i915 performance, master, i915tex gem

2008-05-19 Thread Keith Whitwell

Just reposting this with a new subject line and less preamble.



- Original Message 

 
 Well the thing is I can't believe we don't know enough to do this in some 
 way generically, but maybe the TTM vs GEM thing proves its not possible. 


I don't think there's anything particularly wrong with the GEM
interface -- I just need to know that the implementation can be fixed
so that performance doesn't suck as hard as it does in the current one,
and that people's political views on basic operations like mapping
buffers don't get in the way of writing a decent driver.

We've run a few benchmarks against i915 drivers in all their permutations, and 
to summarize the results look like:
- for GPU-bound apps, there are small differences, perhaps up to 10%.  I'm 
really not concerned about these (yet).
- for CPU-bound apps, the overheads introduced by Intel's approach to 
buffer handling impose a significant penalty in the region of 50-100%.

I
think the latter is the significant result -- none of these experiments
in memory management significantly change the command stream the
hardware has to operate on, so what we're varying essentially is the
CPU behaviour to acheive that command stream.  And it is in CPU usage
where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly
dissapoint.

Or to put it another way, GEM  master/TTM seem to burn huge
amounts
of CPU just running the memory manager.  This isn't true for
master/no-ttm or for i915tex using userspace sub-allocation, where the
CPU penalty for getting decent memory management seems to be minimal
relative to the non-ttm baseline.  

If there's a political
desire to not use userspace sub-allocation, then whatever kernel-based
approach you want to investigate should nonetheless make some effort to
hit reasonable performance goals -- and neither of the current two
kernel-allocation-based approaches currently are at all impressive.

Keith


==
And on an i945G, dual core Pentium D 3Ghz 2MB cache, FSB 800 Mhz, 
single-channel ram:


Openarena timedemo at 640x480:

master w/o TTM:  840 frames, 17.1 seconds: 49.0 fps, 12.24s user 1.02s system 
63% cpu 20.880 total
master with TTM: 840 frames, 15.8 seconds: 53.1 fps, 13.51s user 5.15s system 
95% cpu 19.571 total
i915tex_branch:  840 frames, 13.8 seconds: 61.0 fps, 12.54s user 2.34s system 
85% cpu 17.506 total
gem: 840 frames, 15.9 seconds: 52.8 fps, 11.96s user 4.44s system 
83% cpu 19.695 total

KW: 
It's less obvious here than some of the tests below, but the pattern is
still clear -- compared to master/no-ttm i915tex is getting about the
same ratio of fps to CPU usage, whereas both master/ttm and gem are
significantly worse, burning much more CPU per fps, with a large chunk
of the extra CPU being spent in the kernel.  

The particularly
worrying thing about GEM is that it isn't hitting *either* 100% cpu
*or* maximum framerates from the hardware -- that's really not very
good, as it implies hardware is being left idle unecessarily.


glxgears:

A: ~1029 fps, 20.63user 2.88system 1:00.00elapsed 39%CPU  (master, no ttm) 
B: ~1072 fps, 23.97user 18.06system 1:00.00elapsed 70%CPU  (master, ttm)
C: ~1128 fps, 22.38user 5.21system 1:00.00elapsed 45%CPU  (i915tex, new)
D: ~1167 fps, 23.14user 9.07system 1:00.00elapsed 53%CPU  (i915tex, old)
F: ~1112 fps, 24.70user 21.95system 1:00.00elapsed 77%CPU  (gem)

KW:
The high CPU overhead imposed by GEM and (non-suballocating) master/TTM
should be pretty clear here.  master/TTM burns 30% of CPU just running
the memory manager!!  GEM gets slightly higher framerates but uses even
more CPU than master/TTM.  

fgl_glxgears -fbo:

A: n/a
B: ~244 fps, 7.03user 5.30system 1:00.01elapsed 20%CPU  (master, ttm)
C: ~255 fps, 6.24user 1.71system 1:00.00elapsed 13%CPU  (i915tex, new)
D: ~260 fps, 6.60user 2.44system 1:00.00elapsed 15%CPU  (i915tex, old)
F: ~258 fps, 7.56user 6.44system 1:00.00elapsed 23%CPU  (gem)

KW: GEM  master/ttm burn more cpu to build/submit the same command streams.

openarena 1280x1024:

A: 840 frames, 44.5 seconds: 18.9 fps  (master, no ttm)
B: 840 frames, 40.8 seconds: 20.6 fps  (master, ttm)
C: 840 frames, 40.4 seconds: 20.8 fps  (i915tex, new)
D: 840 frames, 37.9 seconds: 22.2 fps  (i915tex, old)
F: 840 frames, 40.3 seconds: 20.8 fps  (gem)

KW: 
no cpu measurements taken here, but almost certainly GPU bound.  A lot
of similar numbers, I don't believe the deltas have anything in
particular to do with memory management interface choices...

ipers:

A: ~285000 Poly/sec (master, no ttm)
B: ~217000 Poly/sec (master, ttm)
C: ~298000 Poly/sec (i915tex, new)
D: ~227000 Poly/sec (i915tex, old)
F: ~125000 Poly/sec (gen, GPU lockup on first attempt)

KW: no cpu measurements in this run, but all are almost certainly 100% pinned 
on CPU.  
  - i915tex (in particular i915tex, new) show similar performance to classic - 
ie low cpu

Re: TTM vs GEM discussion questions

2008-05-19 Thread Keith Whitwell

On Mon, May 19, 2008 at 2:06 PM, Jerome Glisse [EMAIL PROTECTED] wrote:
 On Mon, 19 May 2008 12:16:57 +0100 (IST)
 Dave Airlie [EMAIL PROTECTED] wrote:
 
  For radeon the plan was to return error from superioctl as during
  superioctl and validation i do know if there is enough gart/vram to do
  the things. Then i think it's up to upper level to properly handle such
  failure from superioctl

 You really want to work this out in advance, at superioctl stage it is too
 late, have a look at the changes I made to the dri_bufmgr.c classic memory
 manager case to deal with this for Intel hw, if you got to superioctl and
 failed unwrapping would be a real pain in the ass, you might have a number
 of pieces of app state you can't reconstruct. I think DirectX handled this
 with cut-points where with the buffer you passed the kernel a set of
 places it could break the batch without too much effort. I think we are
 better just giving the mesa driver a limit and when it hits that limit it
 submits the buffer. The kernel can give it a new optimal limit at any
 point and it should use that as soon as possible. Nothing can solve Ians
 problems where the app gives you a single working set that is too large at
 least with current GL. However you have to deal with the fact that
 batchbuffer has many operations and the total working set needs to fit in
 RAM to be relocated. I've added all the hooks in dri_bufmgr.c for non-TTM
 case, TTM shouldn't be a major effort to add.


 Spliting the cmd before they get submited is the way to go, likely we can
 ask the kernel for estimate of available memory and so userspace can stop
 building cmd stream but this isn't easy. Well anyway this would be a
 userspace problem. Anyway we still will have to fail in superioctl if
 for instance memory fragmentation get in the way.

It's as good as we can do...  We need more than an estimate, of course
-- if the estimate is optimistic, then you're back in the same
situation - trying to deal with it after the fact..

For userspace splitting to work, there needs to be a memory number
given by the kernel which is a *guarantee* that this amount of vram
(or equivalent) is available, and that as long as userspace sticks
within that, the kernel must guarantee that commands submitted will
run...

Unfortunately, this doesn't interact well with the pinning of buffers,
eg for scanout, which may be happening asynchronously in other
processes/threads.  There are some possibilities to have pinning
co-exist with these guarantees, eg by partitioning VRAM into a
pinnable pool and a 'normal' pool, and using the size of the 'normal'
pool as the guaranteed vram number.

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915 performance, master, i915tex gem

2008-05-19 Thread Keith Whitwell


 glxgears uses 40% of the CPU in both classic and gem. Note that the gem
 version takes about 20 seconds to reach a steady state -- the gem driver
 isn't clearing the gtt actively and so glxgears gets far ahead of the
 gpu.

 My theory is that this shows that using cache-aware copies from a single
 static batch buffer (as gem does now) improves cache performance and
 write bandwidth.

I'm still confused by your test setup...  Stepping back from cache
metaphysics, why doesn't classic pin the hardware, if it's still got
60% cpu to burn?

I think getting reproducible results makes a lot of sense.  What
hardware are you actually using -- ie. what is this laptop?

Keith

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM merging?

2008-05-14 Thread Keith Whitwell


 I do worry that TTM is not Linux enough, it seems you have decided that we 
 can never do in-kernel allocations at any useable speed and punted the 
 work into userspace, which makes life easier for Gallium as its more like 
 what Windows does, but I'm not sure this is a good solution for Linux.
 

I have no idea where this set of ideas come from, and it's a little disturbing 
to me. 

On a couple of levels, it's clearly bogus.  

Firstly, TTM and its libdrm interfaces predate gallium by years.

Secondly, the windows work we've done with gallium to date has been on XP and 
_entirely_ in kernel space, so the whole issue of user/kernel allocation 
strategies never came up.

Thirdly, Gallium's backend interfaces are all about abstracting away from the 
OS, so that drivers can be picked up and dumped down in multiple places.  It's 
ludicrous to suggest that the act of abstracting away from TTM has in itself 
skewed TTM -- the point is that the driver has been made independent of TTM.  
The point of Gallium is that it should work on top of *anything* -- if we had 
had to skew TTM in some way to achieve that, then we would have already failed 
right at the starting point...

Lastly, and most importantly, I believe that using TTM kernel allocations to 
back a user space sub-allocator *is the right strategy*.  

This has nothing to do with Gallium.  No matter how fast you make a kernel 
allocator (and I applaud efforts to make it fast), it is always going to be 
quicker to do allocations locally.  This is the reason we have malloc() and not 
just mmap() or brk/sbrk.  

Also, sub-allocation doesn't imply massive preallocation.  That bug is well 
fixed by Thomas' user-space slab allocator code.

Keith


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: TTM merging?

2008-05-14 Thread Keith Whitwell

- Original Message 
 From: Jerome Glisse [EMAIL PROTECTED]
 To: Thomas Hellström [EMAIL PROTECTED]
 Cc: Dave Airlie [EMAIL PROTECTED]; Keith Packard [EMAIL PROTECTED]; DRI 
 dri-devel@lists.sourceforge.net; Dave Airlie [EMAIL PROTECTED]
 Sent: Wednesday, May 14, 2008 6:08:55 PM
 Subject: Re: TTM merging?

 On Wed, 14 May 2008 16:36:54 +0200
 Thomas Hellström wrote:

  Jerome Glisse wrote:
  I don't agree with you here. EXA is much faster for small composite 
  operations and even small fill blits if fallbacks are used. Even to 
  write-combined memory, but that of course depends on the hardware. This 
  is going to be even more pronounced with acceleration architectures like 
  Glucose and similar, that don't have an optimized path for small 
  hardware composite operations.

  My personal feeling is that pwrites are a workaround for a workaround 
  for a very bad decision:

  To avoid user-space allocators on device-mapped memory. This lead to a 
  hack to avoid cahing-policy changes which lead to  cache trashing 
  problems which put us in the current situation.  How far are we going to 
  follow this path before people wake up? What's wrong with the 
  performance of good old i915tex which even beats classic i915 in many 
  cases.

  Having to go through potentially (and even probably) paged-out memory to 
  access buffers to make that are present in VRAM sounds like a very odd 
  approach (to say the least) to me. Even if it's a single page and 
  implementing per-page dirty checks for domain flushing isn't very 
  appealing either.

 I don't have number or benchmark to check how fast pread/pwrite path might
 be in this use so i am just expressing my feeling which happen to just be
 to avoid vma tlb flush as most as we can. I got the feeling that kernel
 goes through numerous trick to avoid tlb flushing for a good reason and
 also i am pretty sure that with number of core keeping growing anythings
 that need cpu broad synchronization is to be avoided.

 Hopefully once i got decent amount of time to do benchmark with gem i will
 check out my theory. I think simple benchmark can be done on intel hw just
 return false in EXA prepare access to force use of download from screen,
 and in download from screen use pread then comparing benchmark of this
 hacked intel ddx with a normal one should already give some numbers.

  Why should we have to when we can do it right?

 Well my point was that mapping vram is not right, i am not saying that
 i know the truth. It's just a feeling based on my experiment with ttm
 and on the bar restriction stuff and others consideration of same kind.

  No. Gem can't coop with it. Let's say you have a 512M system with two 1G 
  video cards, 4G swap space, and you want to fill both card's videoram 
  with render-and-forget textures for whatever purpose.

  What happens? After you've generated the first say 300M, The system 
  mysteriously starts to page, and when, after a a couple of minutes of 
  crawling texture upload speeds, you're done, The system is using and 
  have written almost 2G of swap. Now, you want to update the textures and 
  expect fast texsubimage...

  So having a backing object that you have to access to get things into 
  VRAM is not the way to go.
  The correct way to do this is to reserve, but not use swap space. Then 
  you can start using it on suspend, provided that the swapping system is 
  still up (which is has to be with the current GEM approach anyway). If 
  pwrite is used in this case, it must not dirty any backing object pages.

 For normal desktop i don't expect VRAM amount  RAM amount, people with
 1Go VRAM are usually hard gamer with 4G of ram :). Also most object in
 3d world are stored in memory, if program are not stupid and trust gl
 to keep their texture then you just have the usual ram copy and possibly
 a vram copy, so i don't see any waste in the normal use case. Of course
 we can always come up with crazy weird setup, but i am more interested
 in dealing well with average Joe than dealing mostly well with every
 use case.

It's always been a big win to go to single-copy texturing.  Textures tend to be 
large and nobody has so much memory that doubling up on textures has ever been 
appealing...  And there are obvious use-cases like textured video where only 
having a single copy is a big performance.

It certainly makes things easier for the driver to duplicate textures -- which 
is why all the old DRI drivers did it -- but it doesn't make it right...  And 
the old DRI drivers also copped out on things like render-to-texture, etc, so 
whatever gains you make in simplicity by treating VRAM as a cache, some of 
those will be lost because you'll have to keep track of which one of the two 
copies of a texture is up-to-date, and you'll still have to preserve (modified) 
texture contents on eviction, which old DRI never had to.

Ultimately it boils down to a choice between making your life easier as a

Re: fake bufmgr and aperture sizing.

2008-04-16 Thread Keith Whitwell


  The problem remains how to avoid this situation completely. I guess the 
  drm driver can reserve a global safe aperture size, and communicate 
  that to the 3D client, but the current TTM drivers don't deal with this 
  situation.
  My first idea would probably be your first alternative. Flush and re-do 
  the state-emit if the combined buffer size is larger than the safe 
  aperture size.
 
 I think a dynamically sized safe aperture size that can be used per batch 
 submission, is probably the best plan, this might also allow throttling in 
 multi-app situations to help avoid thrashing, by reducing the per-app 
 limits. For cards with per-process we could make it the size of the 
 per-process aperture.
 
 The case where an app manages to submit a working set for a single 
 operation that is larger than the GPU can deal with, should be considered 
 a bug in the driver I suppose.


The trouble with the safe limit is that it can change in a timeframe that is 
inconvenient for the driver -- ie, if it changes when a driver has already 
constructed most of a scene, what happens?  This is a lot like the old cliprect 
problem, where driver choices can be invalidated later on, leaving it in a 
difficult position.

Trying to chop an already-constructed command stream up after the fact is 
unappealing, even on simple architectures like the i915 in classic mode.  Add 
zone rendering or some other wrinkle  it looses appeal fast.  

What about two limits -- hard  soft?  If the hard limit can avoid changing, 
that makes things a lot nicer for the driver.  When the soft one changes, the 
driver can respect that next frame, but submit the current command stream as is.

Keith










-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Writing a state tracker for Gallium3D/SoftPipe

2008-04-16 Thread Keith Whitwell

There are three components that you'll need:

- state tracker -- which is API dependent
- hw driver -- HW dependent (softpipe is an example), which implements the  
p_context.h interface.  
- winsys -- which is dependent on API, HW, OS, etc.  

The winsys is basically the glue that holds it all together.  The intention is 
for it to be as small as possible and over time we'll improve the concept  
help make it smaller still.  

In Mesa/Gallium/DRI drivers, the winsys is the only component with an overall 
view of the structure of the driver, all the other components see only one 
aspect of it, but the Winsys is what puts all the pieces together, and provides 
the glue/services code to make it all work.

At minimum, the winsys will implement the interface in p_winsys.h, which 
provides surface/buffer management functions to both the state tracker and 
hardware driver.  In addition, the HW drivers each propose a command submission 
interface which is specific to the particular piece of hardware.  As the winsys 
currently implements both these interfaces, it by definition becomes hardware 
specific -- though internally there is usually a separation between these 
pieces.

Regarding the AUB stuff in the Xlib winsys, yes you can ignore that.  It's a 
hack to get a simulator running without hardware - at some point I'll try and 
restructure things to make that clearer.

What I'm guessing you want to know is how to break things down in your proposed 
state tracker.  The overriding principle is to put as little in the winsys as 
possible.  At first, it's clear that anything in p_winsys.h must be done in the 
winsys, similarly for whatever functionality the hw driver requests in it's 
backend interface - eg softpipe/sp_winsys.h.  

Beyond that, the winsys needs to implement some sort of 'create_screen' and 
'create_context' functionality, but would ideally hand off as much as possible 
after that to shared code in the state tracker or HW driver.

What's missing to some extent from the gallium interfaces is a fully-developed 
device/screen/global entity.  Much of the work so far has been around the 
per-context entity (pipe_context), but the per-screen component that does 
surface management, etc, has been less well developed.   That will change over 
time, but for the moment there's more of it in the winsys than I'd like.

Keith


 
- Original Message 
 From: Younes M [EMAIL PROTECTED]
 To: dri-devel@lists.sourceforge.net
 Sent: Wednesday, April 16, 2008 6:47:48 PM
 Subject: Writing a state tracker for Gallium3D/SoftPipe
 
 I'm trying to get up and running writing a state tracker for libXvMC
 using SoftPipe, but looking at the Mesa src a few things are unclear
 to me.
 
 Looking at root/src/gallium/winsys/xlib I can't quite figure out where
 Mesa ends and where Gallium starts. It's my understanding that the
 winsys creates the pipe_context that I need, but is there a generic X
 winsys driver? The xm_* sources where softpipe_create() is eventually
 called (in xm_winsys.c) look very tied to Mesa, even though they
 appear to be in a generic directory, so do all state trackers have to
 implement something similar to use SoftPipe? Is this also the case for
 using hw drivers? Looking at the the aub src files and the dri
 directory it looks like that stuff is tied to the hw and does not have
 to be provided for each state tracker.
 
 -
 This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
 Don't miss this year's exciting event. There's still time to save $100. 
 Use priority code J8TL2D2. 
 http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel
 



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Gallium: Fix for tgsi_emit_sse2()

2008-04-02 Thread Keith Whitwell

Sorry, this slipped through the net a little...  Given how much is hardcoded 
with rtasm, I'd prefer to use a single calling convention everywhere, whether 
that's STDCALL or CDECL or something else I don't mind.  Probably STDCALL 
because some compilers are too dumb to use anything else??  

In which case, a little comment documentating what stdcall really means would 
help a lot - I've hit similar issues with the differences between calling 
conventions and it basically boiled down to disassembling gcc's output to 
figure out what we're supposed to be doing...  

If we switch to stdcall, there are a couple of other platform-specific varients 
in the generated code that can be removed.  It's probably going to be the 
cleanest solution from the point of view of actually working on this code.

Keith

- Original Message 
 From: Stephane Marchesin [EMAIL PROTECTED]
 To: Victor Stinner [EMAIL PROTECTED]
 Cc: dri-devel@lists.sourceforge.net
 Sent: Wednesday, April 2, 2008 12:18:33 PM
 Subject: Re: Gallium: Fix for tgsi_emit_sse2()
 
 So, we should really fix this. The two options are :
 - Keep a different calling convention under linux (cdecl by default,
 which requires saving esi by hand in the shader) and apply Victor's
 patch which saves  restores this register
 - Use the same calling convention on all platforms, that is change
 include/pipe/p_compiler.h to define XSTDCALL to stdcall on linux,
 because for now it's empty, which is _not_ stdcall but cdecl.
 
 In any case, this is a serious issue as under linux esi gets corrupted
 on return from the SSE call. Which, of course, causes crashes.
 
 Stephane
 
 -
 Check out the new SourceForge.net Marketplace.
 It's the best place to buy or sell services for
 just about anything Open Source.
 http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel
 



-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Gallium code reorganization

2008-02-15 Thread Keith Whitwell

OK, I found I had to merge rather than rebase in order to get my changes 
  into the new organization -- apologies for the bubble in the history.

Keith

José Fonseca wrote:
 Just to let you know that the first step, file shuffling, is finished.
 
 The rest will take more time but changes are less pervasive. Once you
 update any private branches to the new directory layout, you should be
 able to keep working as usual.
 
 Here's a quick summary of the changes you might need to do:
  - move your source files to the directory layout described below;
  - update the TOP dirs in your Makefiles;
  - update the include paths, replacing -I src/mesa/pipe to -I
 src/gallium/include -I src/gallium/drivers -I src/gallium/aux;
  - remove pipe/ prefix from from all includes *except* pipe/p_*.h
 includes.
 
 Jose
 
 On Thu, 2008-02-14 at 15:38 +0900, José Fonseca wrote:
 I'll dedicate some time now to reorganize gallium's code  build process. 
 This is
 stuff which has been discussed internally at TG several times, but this time 
 I
 want to get it done.

 My objectives are:
  - leaner and more easy to understand/navigate source tree
  - reduce (or even eliminate) merges between private branches of the common 
 gallium parts
  - help keep the gallium tree portable, by keeping things separate.

 My plan is:

 1. Physically separate gallium source code from mesa code. This will be the
 final layout:

 - src/mesa
 - src/gallium
   - state_tracker
 - ogl
 - ...
   - drivers
 - i915simple
 - i965simple
 - cell
 - ...
   - winsys
 - dri
   - intel
   - ...
 - xlib
 - ...
   - aux
 - tgsi
 - draw
 - pipebuffer
 - llvm
 - cso_cache
 - ...

 i.e., give a subdir in src/gallium to each gallium architectural layer.

 2. Eliminate mesa includes out of the gallium source code from
 everything but mesa's state_tracker (and eventually some winsys). 

 3. Using scons, enhance the build system to support all platforms we are 
 interested (i.e., linux and win32, atm), 

 4. Teach the build system how to pick and build pipe/winsys drivers
 outside of the tree.

 Jose


 
 
 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel
 



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: redesigning the DRM internal logic..

2008-02-14 Thread Keith Whitwell

Alex Deucher wrote:
 On Feb 13, 2008 9:09 PM, Keith Packard [EMAIL PROTECTED] wrote:
 On Wed, 2008-02-13 at 19:22 -0500, Alex Deucher wrote:

 How about a compat node for old clients and a new render node that
 handles both new clients and GPGPU?  Then the backwards compat stuff
 could just be a shim layer and everything else could use the same code
 instead of dealing with separate render and gpgpu nodes.
 Recall that one of the goals is to support multiple user sessions
 running at the same time, so we really do want to have per-session
 'devices' which relate the collection of applications running within
 that session and reflect the access permissions of the various objects
 and methods within that session.

 Any 'compat' node would eventually have to deal with this new
 environment, and I'm not sure it's entirely practical, nor do I think it
 entirely necessary.

 As for GPGPU usage, that would presumably look an awful lot like a
 separate session, although I can imagine there being further limits on
 precisely which operations a GPGPU application could perform.
 
 I guess I just don't see a difference between X/DRI rendering and
 GPGPU; it's just command submission.  It seems like the only reason
 for the render/gpgpu split is for backwards compatibility.  I think we
 need to differentiate between display and rendering rather than
 visual rendering and compute applications.

Yes, though maybe GPGPU is just a convenient phrase for 'rendering 
facility divorced from display'.  I'm not sure.

There are real cases where you want to render images yet never have an 
interest in display - for example scientific visualization and 
gpu-accelerated offline rendering.  From the point of view of the DRM, 
these should fall into the same bucket as GPGPU.

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [Mesa3d-dev] Gallium code reorganization

2008-02-14 Thread Keith Whitwell

Michel Dänzer wrote:
 On Thu, 2008-02-14 at 20:05 +0900, José Fonseca wrote:
 On 2/14/08, Keith Whitwell [EMAIL PROTECTED] wrote:
 José Fonseca wrote:
  
   1. Physically separate gallium source code from mesa code. This will be 
 the
   final layout:
  
   - src/mesa
   - src/gallium
 - state_tracker
   - ogl
   - ...


 I think the one thing I'd say is that the GL state tracker is really a
  part of Mesa -- it's effectively a Mesa driver which targets the Gallium
  interfaces rather than some piece of hardware.

  Given that the gallium interface is fairly water-tight (ie. you can't
  reach round it to some driver internals) compared to the Mesa driver
  interface which is basically just include all the mesa internal
  headers, I think it will become clear if you try and do this that the
  state_tracker will sit pretty uncomfortably anywhere other than inside
  mesa...
 So src/mesa/driver/state_tracker then?
 
 src/mesa/driver/gallium ?

The trouble with this is you now have two things in the stack which can 
be called 'the gallium driver', ie:

GL API
--
Core Mesa
--
Gallium Driver (formerly State Tracker)
--
Gallium API
--
Gallium Driver
--
Winsys, DRM, HW


I'd be happy to either leave this piece out of the proposed changes for 
now, or to move it to mesa/drivers/state_tracker.

Basically I think we have a clear idea what to do with the rest of the 
stack, probably we should just move ahead on that and either leave the 
Mesa state tracker alone or only make minimal changes to it.

Leaving it out of this round of changes doesn't mean that we can't 
move/rename it later -- because it's a part of mesa, changing it later 
won't break or affect any other Gallium clients.  It's really an 
internal matter for Mesa where that code lives  what it's called.

Keith


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Intel releases 965 documentation under CC license

2008-02-01 Thread Keith Whitwell

Philipp Klaus Krause wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 It seems Intel has released complete documentation for the 965:
 http://intellinuxgraphics.com/documentation.html

Hmm, some of the tables seem to be a bit messed up (presumably after the 
conversion to pdf).

Still readable though  definitely good to see...

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: RFC: render buffer

2008-01-16 Thread Keith Whitwell

Jerome Glisse wrote:
 Hi all,
 
 There have been discussion on irc and elsewhere about the need or not
 of an object describing a render buffer (could be scan-out buffer or
 or other specialized card render buffer: color buffer, zbuffer, ...).
 This would mostly act as a wrapper around BO.
 
 Here is what i can imagine as an interface for this:
   - RenderCreate(properties)
   - RenderMapLinear
   - RenderMap
   - RenderUnmap
   - Renderunreference
 Properties would be a set of common properties like:
 - width
 - height
 - bit per pixel
 - pitch
 - scan out buffer
 And also driver private properties like:
 - compressed in this weird specific format
 - ...
 At creation you give a set of properties and you can't change them
 afterward (well i think it's good enough rules).
 
 For what this could be use full ? Well first it would be very useful
 for mode setting as we can then just have a place where constraint on
 scan out buffer memory layout are properly checked. Right now we just
 blindly trust user space to provide a big enough buffer.
 
 This could also be use full for creating render buffer making cmd
 checking a lot easier (as we would have access to width, height,
 pitch or whatever information is needed to make sure that we have
 a proper target for our rendering command.
 
 Also we could offer common interface for scan out buffer where
 the driver should allocate a proper buffer using only default
 properties like (width, height, bit per pixel) and it would be
 up to driver to fill in other properties with good safe default
 value (alignment, pitch, compressed, ...)
 
 I believe having render buffer object concept in kernel make
 sense for the above mentioned reasons and because in graphics
 world render buffer are a key concept and every things in end
 have to deal with it. So to sum-up:
   - easy checking of proper render buffer in kernel
   - easy checking of proper scan out buffer in kernel
   - make user space easier and safer (others things than
 a dri driver will allocate proper buffer without
 having to duplicate code).
 
 Few more words on map i think it could be use full to provide
 two different map method one linear specifically means that
 you want a linear mapping of the buffer (according to width,
 height, pitch and bpp) the others just ask for mapping and
 driver could pass some informations on the layout of the mapped
 memory (tiled, compressed, ...).
 
 For implementation is see two possible way:
 - wrap render buffer around BO
 - make render buffer a specialized BO by adding
   a flag into BO and ptr to render buffer structure
 
 Second solution likely the easier one. Anyway what are people thought
 on all that ?

Pretty much every buffer is potentially a render target, for instance 
all texture buffers when generating mipmaps, etc.

In the example above, different parts of individual buffers may be 
rendered to with different pitches, etc, ie when targetting different 
mipmaps.  Intel hardware uses the same pitch for all mipmaps, but this 
is not universal.

Furthermore things like GL's pixel buffers may be used with different 
pitches etc according to the user's whim.

In general one of the nicest things about the current memory manager is 
that it does *not* impose this type of thing on regular buffer 
management.  I've worked with systems that do and it can be very 
burdensome.

It's not like this presents a security issue to the system at large, so 
the question then is why make it a kernel function?  You just end up 
running into the limitations you've encoded into the kernel in 
generation n when you're trying to do the work for generation n+1.

One motiviation for this sort of thing might be making allocation of 
fenced memory regions easier (fenced in the sense used in Intel HW, 
referring to tiled memory).  I think that might be better handled 
specially without encumbering the system as a whole with a fixed 
interpretation of buffer layout.

Is there a specific issue that this proposal is trying to address?

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: RFC: render buffer

2008-01-16 Thread Keith Whitwell

Jerome Glisse wrote:
 On Wed, 2008-01-16 at 17:35 +, Keith Whitwell wrote:
 Pretty much every buffer is potentially a render target, for instance 
 all texture buffers when generating mipmaps, etc.

 In the example above, different parts of individual buffers may be 
 rendered to with different pitches, etc, ie when targetting different 
 mipmaps.  Intel hardware uses the same pitch for all mipmaps, but this 
 is not universal.

 Furthermore things like GL's pixel buffers may be used with different 
 pitches etc according to the user's whim.

 In general one of the nicest things about the current memory manager is 
 that it does *not* impose this type of thing on regular buffer 
 management.  I've worked with systems that do and it can be very 
 burdensome.

 It's not like this presents a security issue to the system at large, so 
 the question then is why make it a kernel function?  You just end up 
 running into the limitations you've encoded into the kernel in 
 generation n when you're trying to do the work for generation n+1.

 One motiviation for this sort of thing might be making allocation of 
 fenced memory regions easier (fenced in the sense used in Intel HW, 
 referring to tiled memory).  I think that might be better handled 
 specially without encumbering the system as a whole with a fixed 
 interpretation of buffer layout.

 Is there a specific issue that this proposal is trying to address?

 Keith
 
 Well main motivation was for mode setting and command checking,
 for radeon a proper command checking will need to do a lot of,
 (width|pitch)*height*bpp + alignment against bo size checking.
 I do see render buffer object as a way of greatly simplify this.
 But i won't fight for it, i am well aware that current bo are
 really nice because it doesn't enforce a policy.
 
 I guess my main concern is more about how to ask to mode setting
 to program card to use this kind or this kind of layout for
 scan out buffer.

Modesetting and scanout buffers are a different kettle of fish - it may 
be reasonable to have more policy there than we currently do, and I 
don't think that the negatives I'm worried about apply so much to this 
area.

It's quite reasonable to expect that *somebody* in the display stack may 
have more information than the 3d client driver about the necessary 
format, layout, etc of a scanout buffer, and that information would be 
necessary in order to get eg. page flipping to work correctly.

It *may* be that the memory manager/kernel module has a role to play in 
this -- I don't really know one way or another.  I guess the argument is 
stronger when you're talking about cases where the drm module does 
modesetting itself.

It should be possible to put together a proposal in this area that 
doesn't negatively affect the 3d driver's ability to use buffers as 
rendertargets in new  innovative ways.  I'm not sure what it would look 
like exactly, but I'd be happy to evaluate it in the above terms.

Keith


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] Clean up and document drm_ttm.c APIs. drm_bind_ttm - drm_ttm_bind.

2007-12-17 Thread Keith Whitwell

Keith Packard wrote:
 Here are some proposed cleanups and documentation for the drm_ttm.c APIs.
 
 One thing I didn't change was the name of drm_ttm_fixup_caching, which is
 clearly a badly named function. Can anyone explain why you wouldn't just
 always use drm_ttm_unbind instead? The only difference is that
 drm_ttm_unbind makes sure the object is evicted before flushing caches
 and marking it as unbound.

Looks good Keith.  There are a couple of places where you need 
s/flat/flag otherwise looking great.

I can't help with the question above unfortunately...

Keith

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] Change drm_bo_type_dc to drm_bo_type_device and comment usage of this value.

2007-12-17 Thread Keith Whitwell

Keith,

This looks good to me too.

Keith

Keith Packard wrote:
 commit 9856a00ee5e6de30ba3040749583b2eafdf2dfc1
 Author: Keith Packard [EMAIL PROTECTED]
 Date:   Sun Dec 16 22:00:45 2007 -0800
 
 Change drm_bo_type_dc to drm_bo_type_device and comment usage of this 
 value.
 
 I couldn't figure out what drm_bo_type_dc was for; Dave Airlie finally 
 clued
 me in that it was the 'normal' buffer objects with kernel allocated pages
 that could be mmapped from the drm device file.
 
 I thought that 'drm_bo_type_device' was a more descriptive name.
 
 I also added a bunch of comments describing the use of the type enum 
 values and
 the functions that use them.
 
 diff --git a/linux-core/drm_bo.c b/linux-core/drm_bo.c
 index 171c074..df10e12 100644
 --- a/linux-core/drm_bo.c
 +++ b/linux-core/drm_bo.c
 @@ -146,7 +146,7 @@ static int drm_bo_add_ttm(struct drm_buffer_object *bo)
   page_flags |= DRM_TTM_PAGE_WRITE;
  
   switch (bo-type) {
 - case drm_bo_type_dc:
 + case drm_bo_type_device:
   case drm_bo_type_kernel:
   bo-ttm = drm_ttm_create(dev, bo-num_pages  PAGE_SHIFT, 
page_flags, dev-bm.dummy_read_page);
 @@ -1155,7 +1155,12 @@ static void drm_bo_fill_rep_arg(struct 
 drm_buffer_object *bo,
   rep-size = bo-num_pages * PAGE_SIZE;
   rep-offset = bo-offset;
  
 - if (bo-type == drm_bo_type_dc)
 + /*
 +  * drm_bo_type_device buffers have user-visible
 +  * handles which can be used to share across
 +  * processes. Hand that back to the application
 +  */
 + if (bo-type == drm_bo_type_device)
   rep-arg_handle = bo-map_list.user_token;
   else
   rep-arg_handle = 0;
 @@ -1786,7 +1791,12 @@ int drm_buffer_object_create(struct drm_device *dev,
   if (ret)
   goto out_err;
  
 - if (bo-type == drm_bo_type_dc) {
 + /*
 +  * For drm_bo_type_device buffers, allocate
 +  * address space from the device so that applications
 +  * can mmap the buffer from there
 +  */
 + if (bo-type == drm_bo_type_device) {
   mutex_lock(dev-struct_mutex);
   ret = drm_bo_setup_vm_locked(bo);
   mutex_unlock(dev-struct_mutex);
 @@ -1849,7 +1859,12 @@ int drm_bo_create_ioctl(struct drm_device *dev, void 
 *data, struct drm_file *fil
   return -EINVAL;
   }
  
 - bo_type = (req-buffer_start) ? drm_bo_type_user : drm_bo_type_dc;
 + /*
 +  * If the buffer creation request comes in with a starting address,
 +  * that points at the desired user pages to map. Otherwise, create
 +  * a drm_bo_type_device buffer, which uses pages allocated from the 
 kernel
 +  */
 + bo_type = (req-buffer_start) ? drm_bo_type_user : drm_bo_type_device;
  
   /*
* User buffers cannot be shared
 @@ -2607,6 +2622,14 @@ void drm_bo_unmap_virtual(struct drm_buffer_object *bo)
   unmap_mapping_range(dev-dev_mapping, offset, holelen, 1);
  }
  
 +/**
 + * drm_bo_takedown_vm_locked:
 + *
 + * @bo: the buffer object to remove any drm device mapping
 + *
 + * Remove any associated vm mapping on the drm device node that
 + * would have been created for a drm_bo_type_device buffer
 + */
  static void drm_bo_takedown_vm_locked(struct drm_buffer_object *bo)
  {
   struct drm_map_list *list;
 @@ -2614,7 +2637,7 @@ static void drm_bo_takedown_vm_locked(struct 
 drm_buffer_object *bo)
   struct drm_device *dev = bo-dev;
  
   DRM_ASSERT_LOCKED(dev-struct_mutex);
 - if (bo-type != drm_bo_type_dc)
 + if (bo-type != drm_bo_type_device)
   return;
  
   list = bo-map_list;
 @@ -2637,6 +2660,16 @@ static void drm_bo_takedown_vm_locked(struct 
 drm_buffer_object *bo)
   drm_bo_usage_deref_locked(bo);
  }
  
 +/**
 + * drm_bo_setup_vm_locked:
 + *
 + * @bo: the buffer to allocate address space for
 + *
 + * Allocate address space in the drm device so that applications
 + * can mmap the buffer and access the contents. This only
 + * applies to drm_bo_type_device objects as others are not
 + * placed in the drm device address space.
 + */
  static int drm_bo_setup_vm_locked(struct drm_buffer_object *bo)
  {
   struct drm_map_list *list = bo-map_list;
 diff --git a/linux-core/drm_objects.h b/linux-core/drm_objects.h
 index 98421e4..a2d10b5 100644
 --- a/linux-core/drm_objects.h
 +++ b/linux-core/drm_objects.h
 @@ -404,9 +404,31 @@ struct drm_bo_mem_reg {
  };
  
  enum drm_bo_type {
 - drm_bo_type_dc,
 + /*
 +  * drm_bo_type_device are 'normal' drm allocations,
 +  * pages are allocated from within the kernel automatically
 +  * and the objects can be mmap'd from the drm device. Each
 +  * drm_bo_type_device object has a unique name which can be
 +  * used by other processes to share access to the underlying
 +  * buffer.
 +  */
 + drm_bo_type_device,
 + /*
 +  *

Re: [PATCH] Rename inappropriately named 'mask' fields to 'proposed_flags' instead.

2007-12-17 Thread Keith Whitwell

Keith Packard wrote:
 commit 32acf53eefa64cd41cc9bf45705b0825fc8a0eef
 Author: Keith Packard [EMAIL PROTECTED]
 Date:   Sun Dec 16 20:16:50 2007 -0800
 
 Rename inappropriately named 'mask' fields to 'proposed_flags' instead.
 
 Flags pending validation were stored in a misleadingly named field, 
 'mask'.
 As 'mask' is already used to indicate pieces of a flags field which are
 changing, it seems better to use a name reflecting the actual purpose of
 this field. I chose 'proposed_flags' as they may not actually end up in
 'flags', and in an case will be modified when they are moved over.
 
 This affects the API, but not ABI of the user-mode interface.
 

Keith, I think this makes sense too.  I'm hopeful Thomas would agree.

 +/*
 + * drm_bo_propose_flags:
 + *
 + * @bo: the buffer object getting new flags
 + *
 + * @new_flags: the new set of proposed flag bits
 + *
 + * @new_mask: the mask of bits changed in new_flags
 + *
 + * Modify the proposed_flag bits in @bo
 + */

Looks like this comment has already started to drift from the function 
it is documenting??

 +static int drm_bo_modify_proposed_flags (struct drm_buffer_object *bo,
 +  uint64_t new_flags, uint64_t new_mask)


Keith

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Proposal for a few minor internal API changes.

2007-12-15 Thread Keith Whitwell

Keith Packard wrote:
 I'm writing up some documentation for internal DRM interfaces and came
 across a couple of interface inconsistencies that seem like they should
 get fixed before they start getting used a lot more. If these look like
 good changes, I'll continue to search out other similar issues. I'll
 just include the header changes in this message.
 
 Make ttm create/destroy APIs consistent. Pass page_flags in create.
 
 Creating a ttm was done with drm_ttm_init while destruction was done with
 drm_destroy_ttm. Renaming these to drm_ttm_create and drm_ttm_destroy 
 makes
 their use clearer. Passing page_flags to the create function will allow 
 that
 to know whether user or kernel pages are needed, with the goal of allowing
 kernel ttms to be saved for later reuse.
 --- linux-core/drm_objects.h 
 ---
 index 1dc61fd..66611f6 100644
 @@ -297,7 +297,7 @@ struct drm_ttm {
  
  };
  
 -extern struct drm_ttm *drm_ttm_init(struct drm_device *dev, unsigned long 
 size);
 +extern struct drm_ttm *drm_ttm_create(struct drm_device *dev, unsigned long 
 size, uint32_t page_flags);
  extern int drm_bind_ttm(struct drm_ttm *ttm, struct drm_bo_mem_reg *bo_mem);
  extern void drm_ttm_unbind(struct drm_ttm *ttm);
  extern void drm_ttm_evict(struct drm_ttm *ttm);
 @@ -318,7 +318,7 @@ extern int drm_ttm_set_user(struct drm_ttm *ttm,
   * Otherwise it is called when the last vma exits.
   */
  
 -extern int drm_destroy_ttm(struct drm_ttm *ttm);
 +extern int drm_ttm_destroy(struct drm_ttm *ttm);
  
  #define DRM_FLAG_MASKED(_old, _new, _mask) {\
  (_old) ^= (((_old) ^ (_new))  (_mask)); \
 
 
 
 
 Document drm_bo_do_validate. Remove spurious 'do_wait' parameter.
 
 Add comments about the parameters to drm_bo_do_validate, along
 with comments for the DRM_BO_HINT options. Remove the 'do_wait'
 parameter as it is duplicated by DRM_BO_HINT_DONT_BLOCK.
 --- linux-core/drm_objects.h 
 ---
 index 66611f6..1c6ca79 100644
 @@ -546,7 +546,6 @@ extern struct drm_buffer_object 
 *drm_lookup_buffer_object(struct drm_file *file_
  extern int drm_bo_do_validate(struct drm_buffer_object *bo,
 uint64_t flags, uint64_t mask, uint32_t hint,
 uint32_t fence_class,
 -   int no_wait,
 struct drm_bo_info_rep *rep);
  
  /*
 
 
 Document drm_bo_handle_validate. Match drm_bo_do_validate parameter order.
 
 Document parameters and usage for drm_bo_handle_validate. Change parameter
 order to match drm_bo_do_validate (fence_class has been moved to after
 flags, hint and mask values). Existing users of this function have been
 changed, but out-of-tree users must be modified separately.
 --- linux-core/drm_objects.h 
 ---
 index 1c6ca79..0926b47 100644
 @@ -535,9 +535,8 @@ extern int drm_bo_clean_mm(struct drm_device *dev, 
 unsigned mem_type);
  extern int drm_bo_init_mm(struct drm_device *dev, unsigned type,
 unsigned long p_offset, unsigned long p_size);
  extern int drm_bo_handle_validate(struct drm_file *file_priv, uint32_t 
 handle,
 -   uint32_t fence_class, uint64_t flags,
 -   uint64_t mask, uint32_t hint,
 -   int use_old_fence_class,
 +   uint64_t flags, uint64_t mask, uint32_t hint,
 +   uint32_t fence_class, int use_old_fence_class,
 struct drm_bo_info_rep *rep,
 struct drm_buffer_object **bo_rep);

These all look sensible.

It's a pity that the change above looks like it will allow users of the 
old argument order to continue to compile without error despite the 
change.  It's a bit hard to know how to achieve that though.

When you say 'document xyz', and the documentation doesn't appear in the 
patch to the header, where *will* the documentation live??

Keith

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Proposal for a few minor internal API changes.

2007-12-15 Thread Keith Whitwell

Keith Packard wrote:
 On Sat, 2007-12-15 at 10:59 -0700, Brian Paul wrote:
 
 Could a temporary/dummy parameter be added for a while?  Callers that 
 weren't updated would get an error/warning about too few arguments. 
 Then remove the dummy at some point in the future.
 
 We could change the use_old_fence_class into a HINT bit, that would
 reduce the number of parameters by one and cause compile errors for
 existing code. I'd rather not intentionally damage the API temporarily
 though; that seems fairly harsh.
 

Ultimately it's not that big of a deal - if this change makes sense on 
it's own, then sure go ahead.

Otherwise it's only Poulsbo that I can think of being out-of-tree, and 
we should be able to figure out what's going on there fairly quickly 
(though obviously we'll forget all about this conversation until after 
the next merge starts making it behave weirdly).

Keith

-
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services
for just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Memory manager, sub allocator partial validation

2007-12-12 Thread Keith Whitwell

Jerome Glisse wrote:
 Hi,
 
 I am wondering if allowing user to ask for partial validation
 (ie only validate a part of a bo object) might be usefull in
 case of userspace sub allocator and likely in others case when
 you know for instance that you only need to access a small part
 of a bo. Such partial things could also be usefull for mapping
 a bo asking only to map a part of it. I am throwing the idea,
 i haven't yet enough code to test whether this kind of optimization
 might be worthfull for radeon hw (and possibly others)

I think if you want this, the way to get it is to abandon the userspace 
suballocator and use kernel buffers directly.

To have the kernel manage partial buffers doesn't make sense - if this 
is what you want, tell the kernel about the little buffers and let it 
manage them directly.

Keith

-
SF.Net email is sponsored by: 
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915: wait for buffer idle before writing relocations

2007-12-10 Thread Keith Whitwell

Keith Packard wrote:
 On Fri, 2007-12-07 at 11:15 +, Keith Whitwell wrote:
 Keith,

 Thomas has just left for two weeks of (well deserved!) holiday, so he 
 may be slow to respond.
 
 Thanks for taking the time to have a look while he's away; we're
 finishing up the 965 TTM work, and it is posing some challenges with the
 existing kernel interface.
 
 In the meantime, have you considered how this will interact with 
 userspace buffer pools?
 
 No, I hadn't considered that as we're not considering a two-level
 allocation strategy at this point.
 
 However, if you consider the blocking patch in conjunction with the
 presumed_offset optimization, I think you'll find that userspace buffer
 pools will not actually be affected negatively by this change.
 
 The presumed_offset optimization allows the application to compute all
 relocations itself for target buffers which have been mapped to the
 hardware. The kernel relocations are purely a back-up, for cases where
 buffers move between EXECBUFFER invocations.
 
   I know you guys aren't using them at this 
 point, but I'm of the opinion that they are an important facility which 
 needs to be preserved.  At worst it may be that some additional flag is 
 needed to control this behaviour.
 
 We could do this, but I believe this would actually require more
 blocking by the client -- it doesn't know when objects are moving in the
 kernel, so it doesn't know when relocation data will need to be
 rewritten.
 
 Secondly I wonder whether this isn't already caught by other aspects of 
 the buffer manager behaviour?
 
 ie, if the buffer to which the relocation points to is being moved, 
 doesn't that imply all hardware activity related to that buffer must 
 have concluded?  IE, if the buffer itself is free to move, surely all 
 commands containing relocations (or chains of relocations) which point 
 to the buffer must themselves have completed??
 
 Yes, if the target buffer is moving, then the operation related to the
 relocatee will have been completed and waited for. But, re-writing
 relocations doesn't require that the buffers have moved. 
 
 Consider the case of the binding table on 965 which points at surface
 state structures. Executing a command that uses the binding table will
 require that relocations be evaluated for the entries in the table; even
 if nothing moves (ignoring my presumed_offset optimization), those
 relocations will need to be evaluated and the surface state pointers
 stored to the binding table.
 
 For the application to guarantee that the binding table relocations can
 be written without the kernel needing to wait for the binding table
 buffer to be idle, the application would have to wait every time, not
 just when the buffer actually moves.

OK, it sounds like you're talking about situations where the driver is 
modifying state in buffers *only* through changes to the relocations?

It's probably not surprising the fence is not implemented as I'd 
normally think that those relocation changes would be associated with 
some changes to the other data, and that would imply mapping the buffer 
(and hence the wait).  I do understand the examples though and can see 
where you're trying to take this.

Anyway, I'm hopeful that this won't break other usages...

Keith

-
SF.Net email is sponsored by: 
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915: wait for buffer idle before writing relocations

2007-12-10 Thread Keith Whitwell

Yeah, I'm pretty interested to come up with an 'append' type of semantic for 
buffer usage, particularly for things like the state pools you guys are 
probably playing with at the moment.  It's not something that's ever going to 
be a *requirement* for a driver, and may not necessarily be a big win or even 
particularly difficult, but at this point nobody's really dug into it enough to 
know one way or another.

Ignoring relocation issues, an 'append' mapping semantic, as opposed to the 
existing read/write maps, is probably an interesting concept also as it could 
allow mapping a state pool buffer to add new states as required by the 
application, but not require a fence as the old ones won't be interfered with.  

Keith



- Original Message 
From: Keith Packard [EMAIL PROTECTED]
To: Keith Whitwell [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; dri-devel dri-devel@lists.sourceforge.net
Sent: Monday, December 10, 2007 4:44:44 PM
Subject: Re: i915: wait for buffer idle before writing relocations

[...]

I think the interesting usage that you point out is where the
application knows that a wait isn't necessary as the previously
referenced data will not be re-used, and only new portions of the
 buffer
need relocations. 

Given the choice between avoiding waits for cases we have today vs
avoiding waits for cases we may try in the future, it seems reasonable
to solve what we're using now.

-- 
[EMAIL PROTECTED]




-
SF.Net email is sponsored by: 
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915: wait for buffer idle before writing relocations

2007-12-07 Thread Keith Whitwell

Keith,

Thomas has just left for two weeks of (well deserved!) holiday, so he 
may be slow to respond.

In the meantime, have you considered how this will interact with 
userspace buffer pools?  I know you guys aren't using them at this 
point, but I'm of the opinion that they are an important facility which 
needs to be preserved.  At worst it may be that some additional flag is 
needed to control this behaviour.

Secondly I wonder whether this isn't already caught by other aspects of 
the buffer manager behaviour?

ie, if the buffer to which the relocation points to is being moved, 
doesn't that imply all hardware activity related to that buffer must 
have concluded?  IE, if the buffer itself is free to move, surely all 
commands containing relocations (or chains of relocations) which point 
to the buffer must themselves have completed??

Keith



Keith Packard wrote:
 Here's a patch I believe is necessary for the i915 DRM kernel driver;
 right now, the i915 mesa driver never re-uses batch buffers, so there
 can never be an outstanding fence for a buffer with relocations. On 965,
 buffers other than the batch buffer will contain relocations, and may be
 reused (we'll avoid this because of the performance costs).
 
 In any case, this is a correctness fix, as the kernel must not presume
 that user space isn't reusing buffers with relocations.
 
 commit 6f5816b45d62c5b29eb6997885f103c21c92bed1
 Author: Keith Packard [EMAIL PROTECTED]
 Date:   Thu Dec 6 15:12:21 2007 -0800
 
 i915: wait for buffer idle before writing relocations
 
 When writing a relocation entry, make sure the target buffer is idle,
 otherwise the GPU may see inconsistent data.
 
 diff --git a/shared-core/i915_dma.c b/shared-core/i915_dma.c
 index 8791af6..42a2216 100644
 --- a/shared-core/i915_dma.c
 +++ b/shared-core/i915_dma.c
 @@ -756,6 +756,13 @@ int i915_apply_reloc(struct drm_file *file_priv, int 
 num_buffers,
 !drm_bo_same_page(relocatee-offset, new_cmd_offset)) {
 drm_bo_kunmap(relocatee-kmap);
 relocatee-offset = new_cmd_offset;
 +   mutex_lock (relocatee-buf-mutex);
 +   ret = drm_bo_wait (relocatee-buf, 0, 0, FALSE);
 +   mutex_unlock (relocatee-buf-mutex);
 +   if (ret) {
 +   DRM_ERROR(Could not wait for buffer to apply 
 relocs\n %08lx, new_cmd_offset);
 +   return ret;
 +   }
 ret = drm_bo_kmap(relocatee-buf, new_cmd_offset  
 PAGE_SHIFT,
   1, relocatee-kmap);
 if (ret) {
 
 
 
 
 
 -
 SF.Net email is sponsored by: 
 Check out the new SourceForge.net Marketplace.
 It's the best place to buy or sell services for
 just about anything Open Source.
 http://sourceforge.net/services/buy/index.php
 
 
 
 
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel


-
SF.Net email is sponsored by: 
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

2007-11-28 Thread Keith Whitwell

Kristian Høgsberg wrote:
 On Nov 27, 2007 2:30 PM, Keith Packard [EMAIL PROTECTED] wrote:
 ...
   I both cases, the kernel will need to
 know how to activate a given context and the context handle should be
 part of the super ioctl arguments.
 We needn't expose the contexts to user-space, just provide a virtual
 consistent device and manage contexts in the kernel. We could add the
 ability to manage contexts from user space for cases where that makes
 sense (like, perhaps, in the X server where a context per client may be
 useful).
 
 Oh, right we don't need one per GLContext, just per DRI client, mesa
 handles switching between GL contexts.  What about multithreaded
 rendering sharing the same drm fd?
 
 I imagine one optimiation you could do with a fixed number of contexts
 is to assume that loosing the context will be very rare, and just fail
 the super-ioctl when it happens, and then expect user space to
 resubmit with state emission preamble.  In fact it may work well for
 single context hardware...
 I recall having the same discussion in the past; have the superioctl
 fail so that the client needn't constantly compute the full state
 restore on every submission may be a performance win for some hardware.
 All that this requires is a flag to the kernel that says 'this
 submission reinitializes the hardware', and an error return that says
 'lost context'.
 
 Exactly.
 
 But the super-ioctl is chipset specific and we can decide on the
 details there on a chipset to chipset basis.  If you have input to how
 the super-ioctl for intel should look like to support lockless
 operation for current and future intel chipset, I'd love to hear it.
 And of course we can version our way out of trouble as a last resort.
 We should do the lockless and context stuff at the same time; doing
 re-submit would be a bunch of work in the driver that would be wasted.
 
 Is it that bad? We will still need the resubmit code for older
 chipsets that dont have the hardware context support.  The drivers
 already have the code to emit state in case of context loss, that code
 just needs to be repurposed to generate a batch buffer to prepend to
 the rendering code.  And the rendering code doesn't need to change
 when resubmitting.  Will that work?
 
 Right now, we're just trying to get 965 running with ttm; once that's
 limping along, figuring out how to make it reasonable will be the next
 task, and hardware context support is clearly a big part of that.
 
 Yeah - I'm trying to limit the scope of DRI2 so that we can have
 redirected direct rendering and GLX1.4 in the tree sooner rather than
 later (before the end of the year).  Maybe the best way to do that is
 to keep the lock around for now and punt on the lock-less super-ioctl
 if that really blocks on hardware context support.  How far back is
 hardware contexts supported?

There are three ways to support lockless operation
- hardware contexts
- a full preamble emit per batchbuffer
- passing a pair of preamble, payload batchbuffers per ioctl

I think all hardware is capable of supporting at least one of these.

That said, if the super-ioctl is per-device, then you can make a choice 
per-device in terms of whether the lock is required or not, which makes 
things easier.  The reality is that most ttm based drivers will do very 
little differently on a regular lock compared to a contended one -- at 
most they could decide whether or not to emit a preamble they computed 
earlier.

Keith




-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

2007-11-28 Thread Keith Whitwell

Kristian Høgsberg wrote:

 Another problem is that it's not just swapbuffer - anything that
 touches the front buffer have to respect the cliprects - glCopyPixels
 and glXCopySubBufferMESA - and thus have to happen in the kernel.

These don't touch the real swapbuffer, just the fake one.  Hence they 
don't care about cliprects and certainly don't have to happen in the 
kernel...

Keith



-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

2007-11-28 Thread Keith Whitwell

Michel Dänzer wrote:
 On Wed, 2007-11-28 at 09:30 +, Keith Whitwell wrote:
 Kristian Høgsberg wrote:

 Another problem is that it's not just swapbuffer - anything that
 touches the front buffer have to respect the cliprects - glCopyPixels
 and glXCopySubBufferMESA - and thus have to happen in the kernel.
 These don't touch the real swapbuffer, just the fake one.  Hence they 
 don't care about cliprects and certainly don't have to happen in the 
 kernel...
 
 I'm not sure about glCopyPixels, but glXCopySubBufferMESA would most
 definitely be useless if it didn't copy to the real frontbuffer.

Yes, wasn't paying attention...  glxCopySubBufferMESA would do both - 
copy to the fake front buffer and then trigger a damage-induced update 
of the real frontbuffer.  Neither operation requires the 3d driver know 
about cliprects, and the damage operation is basically a generalization 
of the swapbuffer stuff we've been talking about.

Keith


-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Swapbuffers [was: Re: DRI2 and lock-less operation]

2007-11-28 Thread Keith Whitwell

Kristian Høgsberg wrote:
 On Nov 27, 2007 11:48 AM, Stephane Marchesin
 [EMAIL PROTECTED] wrote:
 On 11/22/07, Kristian Høgsberg [EMAIL PROTECTED] wrote:
 ...
 It's all delightfully simple, but I'm starting to reconsider whether
 the lockless bullet point is realistic.   Note, the drawable lock is
 gone, we always render to private back buffers and do swap buffers in
 the kernel, so I'm only concerned with the DRI lock here.  The idea
 is that since we have the memory manager and the super-ioctl and the X
 server now can push cliprects into the kernel in one atomic operation,
 we would be able to get rid of the DRI lock.  My overall question,
 here is, is that feasible?
 How do you plan to ensure that X didn't change the cliprects after you
 emitted them to the DRM ?
 
 The idea was that the buffer swap happens in the kernel, triggered by
 an ioctl. The kernel generates the command stream to execute the swap
 against the current set of cliprects.  The back buffers are always
 private so the cliprects only come into play when copying from the
 back buffer to the front buffer.  Single buffered visuals are secretly
 double buffered and implemented the same way.
 
 I'm trying to figure now whether it makes more sense to keep cliprects
 and swapbuffer out of the kernel, which wouldn't change the above
 much, except the swapbuffer case.  I described the idea for swapbuffer
 in this case in my reply to Thomas: the X server publishes cliprects
 to the clients through a shared ring buffer, and clients parse the
 clip rect changes out of this buffer as they need it.  When posting a
 swap buffer request, the buffer head should be included in the
 super-ioctl so that the kernel can reject stale requests.  When that
 happens, the client must parse the new cliprect info and resubmit an
 updated swap buffer request.

In my ideal world, the entity which knows and cares about cliprects 
should be the one that does the swapbuffers, or at least is in control 
of the process.  That entity is the X server.

Instead of tying ourselves into knots trying to figure out how to get 
some other entity a sufficiently up-to-date set of cliprects to make 
this work (which is what was wrong with DRI 1.0), maybe we should try 
and figure out how to get the X server to efficiently orchestrate 
swapbuffers.

In particular it seems like we have:

1) The X server knows about cliprects.
2) The kernel knows about IRQ reception.
3) The kernel knows how to submit rendering commands to hardware.
4) Userspace is where we want to craft rendering commands.

Given the above, what do we think about swapbuffers:

a) Swapbuffers is a rendering command
b) which depends on cliprect information
c) that needs to be fired as soon as possible after an IRQ receipt.

So:
swapbuffers should be crafted from userspace (a, 4)
... by the X server (b, 1)
... and should be actually fired by the kernel (c, 2, 3)


I propose something like:

0) 3D client submits rendering to the kernel and receives back a fence.

1) 3D client wants to do swapbuffers.  It sends a message to the X 
server asking it please do me a swapbuffers after this fence has 
completed.

2) X server crafts (somehow) commands implementing swapbuffers for this 
drawable under the current set of cliprects and passes it to the kernel 
along with the fence.

3) The kernel keeps that batchbuffer to the side until
a) the commands associated with the fence have been submitted to 
hardware.
b) the next vblank IRQ arrives.

when both of these are true, the kernel simply submits the prepared 
swapbuffer commands through the lowest latency path to hardware.

But what happens if the cliprects change?  The 100% perfect solution 
looks like:

The X server knows all about cliprect changes, and can use fences or 
other mechanisms to keep track of which swapbuffers are outstanding.  At 
the time of a cliprect change, it must create new swapbuffer commandsets 
for all pending swapbuffers and re-submit those to the kernel.

These new sets of commands must be tied to the progress of the X 
server's own rendering command stream so that the kernel fires the 
appropriate one to land the swapbuffers to the correct destination as 
the X server's own rendering flies by.

In the simplest case, where the kernel puts commands onto the one true 
ring as it receives them, the kernel can simply discard the old 
swapbuffer command.  Indeed this is true also if the kernel has a 
ring-per-context and uses one of those rings to serialize the X server 
rendering and swapbuffers commands.

Note that condition 3a) above is always true in the current i915.o 
one-true-ring/single-fifo approach to hardware serialization.

I think the above can work and seems more straight-forward than many of 
the proposed alternatives.

Keith



-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from

Re: Clip Lists

2007-11-28 Thread Keith Whitwell

Stephane Marchesin wrote:
 
 
 On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 Stephane Marchesin [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] writes:
 
   I fail to see how this works with a lockless design. How do you
 ensure the X
   server doesn't change cliprects between the time it has written
 those in the
   shared ring buffer and the time the DRI client picks them up and
 has the
   command fired and actually executed ? Do you lock out the server
 during that
   time ?
 
 The scheme I have been advocating is this:
 
 - A new extension is added to the X server, with a
   PixmapFromBufferObject request.
 
 - Clients render into a private back buffer object, for which they
   used the new extension to generate a pixmap.
 
 - When a client wishes to copy something to the frontbuffer (for
   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
   plain old XCopyArea() with the generated pixmap. The X server is
   then responsible for any clipping necessary.
 
 This scheme puts all clip list management in the X server. No
 cliprects in shared memory or in the kernel would be required. And no
 locking is required since the X server is already processing requests
 in sequence. 
 
 
 Yes, that is the idea I want to do for nvidia hardware.
 Although I'm not sure if we can/want to implement it in term of X 
 primitives or a new X extension.
  
 
 To synchronize with vblank, a new SYNC counter is introduced that
 records the number of vblanks since some time in the past. The clients
 can then issue SyncAwait requests before any copy they want
 synchronized with vblank. This allows the client to do useful
 processing while it waits, which I don't believe is the case now.
 
 
 Since we can put a wait until vblank on crtc #X command to a fifo on 
 nvidia hardware, the vblank issue is non-existent for us. We get precise 
 vblank without CPU intervention.

You still have some issues...

The choice is: do you put the wait-until-vblank command in the same fifo 
as the X server rendering or not?

If yes -- you end up with nasty latency for X as its rendering is 
blocked by swapbuffers.

If no -- you face the question of what to do when cliprects change.

The only way to make 'no' work is to effectively block the X server from 
changing cliprects while such a command is outstanding -- which leads 
you back to latency issues - probably juddery window moves when 3d is 
active.

I don't think hardware gives you a way out of jail for swapbuffers in 
the presence of changing cliprects.

Keith

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Clip Lists

2007-11-28 Thread Keith Whitwell

Stephane Marchesin wrote:
 
 
 On 11/28/07, *Keith Whitwell* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 Stephane Marchesin wrote:
  
  
   On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann*
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
   mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:
  
   Stephane Marchesin [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
   mailto: [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] writes:
  
 I fail to see how this works with a lockless design. How
 do you
   ensure the X
 server doesn't change cliprects between the time it has
 written
   those in the
 shared ring buffer and the time the DRI client picks them
 up and
   has the
 command fired and actually executed ? Do you lock out the
 server
   during that
 time ?
  
   The scheme I have been advocating is this:
  
   - A new extension is added to the X server, with a
 PixmapFromBufferObject request.
  
   - Clients render into a private back buffer object, for which
 they
 used the new extension to generate a pixmap.
  
   - When a client wishes to copy something to the frontbuffer (for
 whatever reason - glXSwapBuffers(), glCopyPixels(), etc),
 it uses
 plain old XCopyArea() with the generated pixmap. The X
 server is
 then responsible for any clipping necessary.
  
   This scheme puts all clip list management in the X server. No
   cliprects in shared memory or in the kernel would be
 required. And no
   locking is required since the X server is already processing
 requests
   in sequence.
  
  
   Yes, that is the idea I want to do for nvidia hardware.
   Although I'm not sure if we can/want to implement it in term of X
   primitives or a new X extension.
  
  
   To synchronize with vblank, a new SYNC counter is introduced that
   records the number of vblanks since some time in the past.
 The clients
   can then issue SyncAwait requests before any copy they want
   synchronized with vblank. This allows the client to do useful
   processing while it waits, which I don't believe is the case
 now.
  
  
   Since we can put a wait until vblank on crtc #X command to a
 fifo on
   nvidia hardware, the vblank issue is non-existent for us. We get
 precise
   vblank without CPU intervention.
 
 You still have some issues...
 
 The choice is: do you put the wait-until-vblank command in the same fifo
 as the X server rendering or not?
 
 If yes -- you end up with nasty latency for X as its rendering is
 blocked by swapbuffers.
 
 
 Yes, I want to go for that simpler approach first and see if the 
 blocking gets bad (I can't really say until I've tried).

I'm all for experiments such as this.

Although I have a strong belief how it will turn out, nothing is better 
at changing these sorts of beliefs than actual results.

Keith

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Swapbuffers [was: Re: DRI2 and lock-less operation]

2007-11-28 Thread Keith Whitwell

Stephane Marchesin wrote:
 On 11/28/07, *Keith Whitwell* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:
 
 
 In my ideal world, the entity which knows and cares about cliprects
 should be the one that does the swapbuffers, or at least is in control
 of the process.  That entity is the X server.
 
 Instead of tying ourselves into knots trying to figure out how to get
 some other entity a sufficiently up-to-date set of cliprects to make
 this work (which is what was wrong with DRI 1.0), maybe we should try
 and figure out how to get the X server to efficiently orchestrate
 swapbuffers.
 
 In particular it seems like we have:
 
 1) The X server knows about cliprects.
 2) The kernel knows about IRQ reception.
 3) The kernel knows how to submit rendering commands to hardware.
 4) Userspace is where we want to craft rendering commands.
 
 Given the above, what do we think about swapbuffers:
 
 a) Swapbuffers is a rendering command
 b) which depends on cliprect information
 c) that needs to be fired as soon as possible after an IRQ
 receipt.
 
 So:
 swapbuffers should be crafted from userspace (a, 4)
 ... by the X server (b, 1)
 ... and should be actually fired by the kernel (c, 2, 3)
 
 
 Well, on nvidia hw, you don't even need to fire from the kernel (thanks 
 to a special fifo command that waits for vsync).
 So I'd love it if going through the kernel for swapbuffers was 
 abstracted by the interface.

As I suggested elsewhere, I think that you're probably going to need 
this even on nvidia hardware.

 I propose something like:
 
 0) 3D client submits rendering to the kernel and receives back a fence.
 
 1) 3D client wants to do swapbuffers.  It sends a message to the X
 server asking it please do me a swapbuffers after this fence has
 completed.
 
 2) X server crafts (somehow) commands implementing swapbuffers for this
 drawable under the current set of cliprects and passes it to the kernel
 along with the fence.
 
 3) The kernel keeps that batchbuffer to the side until
 a) the commands associated with the fence have been
 submitted to hardware.
 b) the next vblank IRQ arrives.
 
 when both of these are true, the kernel simply submits the prepared
 swapbuffer commands through the lowest latency path to hardware.
 
 But what happens if the cliprects change?  The 100% perfect solution
 looks like:
 
 The X server knows all about cliprect changes, and can use fences or
 other mechanisms to keep track of which swapbuffers are
 outstanding.  At
 the time of a cliprect change, it must create new swapbuffer commandsets
 for all pending swapbuffers and re-submit those to the kernel.
 
 These new sets of commands must be tied to the progress of the X
 server's own rendering command stream so that the kernel fires the
 appropriate one to land the swapbuffers to the correct destination as
 the X server's own rendering flies by.
 
 
 Yes that was the basis for my thinking as well. By inserting the 
 swapbuffers into the normal flow of X commands, we remove the need for 
 syncing with the X server at swapbuffer time.

The very simplest way would be just to have the X server queue it up 
like normal blits and not even involve the kernel.  I'm not proposing 
this.  I believe such an approach will fail for the sync-to-vblank case 
due to latency issues - even (I suspect) with hardware-wait-for-vblank.

Rather, I'm describing a mechanism that allows a pre-prepared swapbuffer 
command to be injected into the X command stream (one way or another) 
with the guarantee that it will encode the correct cliprects, but which 
will avoid stalling the command queue in the meantime.


 In the simplest case, where the kernel puts commands onto the one true
 ring as it receives them, the kernel can simply discard the old
 swapbuffer command.  Indeed this is true also if the kernel has a
 ring-per-context and uses one of those rings to serialize the X server
 rendering and swapbuffers commands. 
 
 
 Come on, admit that's a hack to get 100'000 fps in glxgears :)

I'm not talking about discarding the whole swap operation, just the 
version of the swap command buffer that pertained to the old cliprects. 
  Every swap is still being performed.

You do raise a good point though -- we currently throttle the 3d driver 
based on swapbuffer fences.  There would need to be some equivalent 
mechanism to achieve this.

 
 Note that condition 3a) above is always true in the current i915.o
 one-true-ring/single-fifo approach to hardware serialization.
 
 
 Yes, I think those details of how to wait should be left 
 driver-dependent and abstracted in user space. So that we have the 
 choice of calling the kernel, doing it from user space only, relying on 
 a single fifo

Re: DRI2 and lock-less operation

2007-11-27 Thread Keith Whitwell

In general the problem with the superioctl returning 'fail' is that the client 
has to then go back in time and figure out what the state preamble would have 
been at the start of the batchbuffer.  Of course the easiest way to do this is 
to actually precompute the preamble at batchbuffer start time and store it in 
case the superioctl fails -- in which case, why not pass it to the kernel along 
with the rest of the batchbuffer and have the kernel decide whether or not to 
play it?

Keith

- Original Message 
From: Kristian Høgsberg [EMAIL PROTECTED]
To: Keith Packard [EMAIL PROTECTED]
Cc: Jerome Glisse [EMAIL PROTECTED]; Dave Airlie [EMAIL PROTECTED]; 
dri-devel@lists.sourceforge.net; Keith Whitwell [EMAIL PROTECTED]
Sent: Tuesday, November 27, 2007 8:44:48 PM
Subject: Re: DRI2 and lock-less operation


On Nov 27, 2007 2:30 PM, Keith Packard [EMAIL PROTECTED] wrote:
...
I both cases, the kernel will need to
  know how to activate a given context and the context handle should
 be
  part of the super ioctl arguments.

 We needn't expose the contexts to user-space, just provide a virtual
 consistent device and manage contexts in the kernel. We could add the
 ability to manage contexts from user space for cases where that makes
 sense (like, perhaps, in the X server where a context per client may
 be
 useful).

Oh, right we don't need one per GLContext, just per DRI client, mesa
handles switching between GL contexts.  What about multithreaded
rendering sharing the same drm fd?

  I imagine one optimiation you could do with a fixed number of
 contexts
  is to assume that loosing the context will be very rare, and just
 fail
  the super-ioctl when it happens, and then expect user space to
  resubmit with state emission preamble.  In fact it may work well
 for
  single context hardware...

 I recall having the same discussion in the past; have the superioctl
 fail so that the client needn't constantly compute the full state
 restore on every submission may be a performance win for some
 hardware.
 All that this requires is a flag to the kernel that says 'this
 submission reinitializes the hardware', and an error return that says
 'lost context'.

Exactly.

  But the super-ioctl is chipset specific and we can decide on the
  details there on a chipset to chipset basis.  If you have input to
 how
  the super-ioctl for intel should look like to support lockless
  operation for current and future intel chipset, I'd love to hear
 it.
  And of course we can version our way out of trouble as a last
 resort.

 We should do the lockless and context stuff at the same time; doing
 re-submit would be a bunch of work in the driver that would be
 wasted.

Is it that bad? We will still need the resubmit code for older
chipsets that dont have the hardware context support.  The drivers
already have the code to emit state in case of context loss, that code
just needs to be repurposed to generate a batch buffer to prepend to
the rendering code.  And the rendering code doesn't need to change
when resubmitting.  Will that work?

 Right now, we're just trying to get 965 running with ttm; once that's
 limping along, figuring out how to make it reasonable will be the
 next
 task, and hardware context support is clearly a big part of that.

Yeah - I'm trying to limit the scope of DRI2 so that we can have
redirected direct rendering and GLX1.4 in the tree sooner rather than
later (before the end of the year).  Maybe the best way to do that is
to keep the lock around for now and punt on the lock-less super-ioctl
if that really blocks on hardware context support.  How far back is
hardware contexts supported?

Kristian




-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

2007-11-26 Thread Keith Whitwell

Kristian Høgsberg wrote:
 On Nov 22, 2007 4:23 AM, Keith Whitwell [EMAIL PROTECTED] wrote:
 ...
 My guess for one way is to store a buffer object with the current state
 emission in it, and submit it with the superioctl maybe, and if we have
 lost context emit it before the batchbuffer..
 The way drivers actually work at the moment is to emit a full state as a
 preamble to each batchbuffer.  Depending on the hardware, this can be
 pretty low overhead, and it seems that the trend in hardware is to make
 this operation cheaper and cheaper.  This works fine without the lock.

 There is another complimentary trend to support one way or another
 multiple hardware contexts (obviously nvidia have done this for years),
 meaning that effectively the hardware (effectively) does the context
 switches.  This is probably how most cards will end up working in the
 future, if not already.

 Neither of these need a lock for detecting context switches.
 
 Sure enough, but the problem is that without the lock userspace can't
 say oops, I lost the context, let me prepend this state emission
 preamble to the batchbuffer. in a race free way.  If we want
 conditional state emission, we need to make that decision in the
 kernel.

The cases I describe above don't try to do this, but if you really 
wanted to, the way to do it would be to have userspace always emit the 
preamble but pass two offsets to the kernel, one at the start of the 
preamble, the other after it.  Then the kernel can choose.

I don't think there's a great deal to be gained from this optimization, 
though.


 
 For example, the super ioctl could send the state emission code as a
 separate buffer and also include the expected context handle.  This
 lets the kernel compare the context handle supplied in the super ioctl
 with the most recently active context handle, and if they differ, the
 kernel queues the state emission buffer first and then the rendering
 buffer.  If the context handles match, the kernel just queues the
 rendering batch buffer.
 
 However, this means that user space must prepare the state emission
 code for each submission, whether or not it will actually be used.
 I'm not sure if this is too much overhead or if it's negligible?

I think both preparing it on CPU and executing it on GPU are likely to 
be pretty negligible, but some experimentation on a system with just a 
single app running should show this quickly one way or another.

Keith


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

2007-11-22 Thread Keith Whitwell

Dave Airlie wrote:
 I'm trying to figure out how context switches acutally work... the DRI
 lock is overloaded as context switcher, and there is code in the
 kernel to call out to a chipset specific context switch routine when
 the DRI lock is taken... but only ffb uses it... So I'm guessing the
 way context switches work today is that the DRI driver grabs the lock
 and after potentially updating the cliprects through X protocol, it
 emits all the state it depends on to the cards.  Is the state emission
 done by just writing out a bunch of registers?  Is this how the X
 server works too, except XAA/EXA acceleration doesn't depend on a lot
 of state and thus the DDX driver can emit everything for each
 operation?
 
 So yes userspaces notices context has changed and just re-emits everything 
 into the batchbuffer it is going to send, for XAA/EXA stuff in Intel at 
 least there is an invarient state emission functions that notices what the 
 context was and what the last server 3D users was (EXA or Xv texturing) 
 and just dumps the state into the batchbuffer.. (or currently into the 
 ring)
 
 How would this work if we didn't have a lock?  You can't emit the
 state and then start rendering without a lock to keep the state in
 place...  If the kernel doesn't restore any state, whats the point of
 the drm_context_t we pass to the kernel in drmLock?  Should the kernel
 know how to restore state (this ties in to the email from jglisse on
 state tracking in drm and all the gallium jazz, I guess)?  How do we
 identify state to the kernel, and how do we pass it in in the
 super-ioctl?  Can we add a list of registers to be written and the
 values?  I talked to Dave about it and we agreed that adding a
 drm_context_t to the super-ioctl would work, but now I'm thinking if
 the kernel doesn't track any state it wont really work.  Maybe
 cross-client state sharing isn't useful for performance as Keith and
 Roland argues, but if the kernel doesn't restore state when it sees a
 super-ioctl coming from a different context, who does?

 
 My guess for one way is to store a buffer object with the current state 
 emission in it, and submit it with the superioctl maybe, and if we have 
 lost context emit it before the batchbuffer..

The way drivers actually work at the moment is to emit a full state as a 
preamble to each batchbuffer.  Depending on the hardware, this can be 
pretty low overhead, and it seems that the trend in hardware is to make 
this operation cheaper and cheaper.  This works fine without the lock.

There is another complimentary trend to support one way or another 
multiple hardware contexts (obviously nvidia have done this for years), 
meaning that effectively the hardware (effectively) does the context 
switches.  This is probably how most cards will end up working in the 
future, if not already.

Neither of these need a lock for detecting context switches.

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: mapped cached memory and pre-fetching.

2007-11-01 Thread Keith Whitwell

Thomas Hellström wrote:
Dave Airlie wrote:

On 10/31/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Dave Airlie wrote:

On 10/31/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Dave,

When starting out with TTM i did look a little at AGP caching issues and
there was an issue with cached memory and speculative pre-fetching that
may affect the mapped-cached memory,
and that we need to know about but perhaps ignore.

Suppose you bind a page to the AGP aperture, but don't change the kernel
linear map caching policy.
Then a speculatively prefetching processor may read the memory into its
cache and then decide it doesn't
want to use it, and actually write it back.
Meanwhile the GPU may have changed the contents of the page and that
change will be overwritten. Apparently there were big problems with AMD
Athlons actually doing this. Linux people claiming it was an Athlon bug
and AMD people claiming it was within specs.

http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/061/6148/6148s1.html

Is what I believe you are talking about, I'll add something to the
comment mentioning this..

Yup. In the end I believe the change_page_attr(), global_flush_tlb()
sequence was the final outcome of this, but as I understand it,
with your new code we never write through the GTT, which makes the only
possible problem overwrites of GPU written data.

Well so far we've only dealt with Intel CPU/GPU combinations which
hopefully don't suffer from this issue.. I'll put a comment in but tbh
there are lots of ways to mess up things with the current APIs..

Try allocating a snooped batchbuffer, or a snooped private back buffer
or anything involved in a blit I'm going to add checks for some of
the more stupid things in the Intel superioctl code...

Well i915 (and friends') snooped memory is, as you say not very useful,
but it works fine AFAICT for things like the flip move from VRAM (which
has been disabled ATM due to HW lock issues). Should also be fine for
readpixels and zero copy texturing, although I doubt that there is any
performance gain in the latter.

FWIW, zero copy texturing is a good win when the texture is used only
once, eg for video. The streamingrect application (when it was working)
showed a very good improvement with the cow-pbo hack (also when it was
working).

To make it work transparently is quite difficult fragile though.

Keith

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: intel hw and caching interface to TTM..

2007-10-30 Thread Keith Whitwell

Dave Airlie wrote:
 Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent
 memory, that is cache coherent also while visible to the GPU. There are HW
 implementations out there (Poulsbo at least) where this option actually seems
 to work, althought it's considerably slower for things like texturing. It's
 also a requirement for user bo's since they will have VMAs that we cant kill
 and remap.
 
 Most PCIE cards will be cache coherent, however AGP cards not so much, so 
 need to think if a generic _CACHED makes sense especially for something 
 like radeon, will I have to pass different flags depending on the GART 
 type this seems like uggh.. so maybe a separate flag makes more 
 sense..
 
 Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean
 DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will
 also indicate that the buffer cannot be used for user-space sub-allocators, 
 as
 we in that case must be able to guarantee that the CPU can access parts of 
 the
 buffer while other parts are validated for the GPU.
 
 Yes, to be honest sub-allocators for most use-cases should be avoided if 
 possible, we should be able to make the kernel interface fast enough for 
 most things if we don't have to switching caching flags on the fly at 
 map/destroy etc.. 

Hmm - if that was true why do we have malloc() and friends - aren't they 
just sub-allocators for brk() and mmap()?

There is more to this than performance - applications out there can 
allocate extraordinarily large numbers of small textures, that can only 
sanely be dealt with as light-weight userspace suballocations of a 
sensible-sized buffer.  (We don't do this yet, but will need to at some 
point!).  The reasons for this are granularity (ie wasted space in the 
allocation), the memory overhead of managing all these allocations, and 
perhaps third performance.

If you think about what goes on in a 3d driver, you are always doing 
sub-allocations of some sort or another, though that's more obvious when 
you start doing state objects that have an independent lifecycle as 
opposed to just emitting state linearly into a command buffer.  For 
managing objects of a few dozen bytes, obviously you are going to want 
to do that in userspace.

So there is a continuum where successively larger buffers increasingly 
justify whatever additional cost there is to go directly to the kernel 
to allocate them.  But for sufficiently small or frequently allocated 
buffers, there will always be a crossover point where it is faster to do 
it in userspace.

It certainly makes sense to speed up the kernel paths, but that won't 
make the crossover point go away - it'll just shift it more or less 
depending on how successful you are.

Keith



-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [patch] post superioctl inteface removal.

2007-10-18 Thread Keith Whitwell

Thomas Hellström wrote:
 Hi, Eric.
 
 Eric Anholt wrote:

...

 Can you clarify the operation being done where you move scanout buffers
 before unpinning them?  That seems contradictory to me -- how are you
 scanning out while the object is being moved, and how are you
 considering it pinned during that time?
  

 Actually it's very similar to Dave's issue, and the buffers aren't 
 pinned when they are thrown out. What we'd want to do is the following:
 
 1) Turn of Crtcs.
 2) Unpin the buffer
 3) Destroy the buffer, leaving it's memory area free.
 4) Create and pin new buffer (skipping the copy)
 5) Turn on Crtcs.
 
 However, with DRI clients operating, 3) will fail. As they may maintain 
 a reference on the front buffer, the old buffer won't immediately get 
 destroyed and it's aperture / VRAM memory area isn't freed up, unless it 
 gets evicted by a new allocation.

Is there really a long-term need for DRI clients to maintain a reference 
to the front buffer?  We're moving away from this in lot of ways and if 
it is a benefit to the TTM work, we could look at severing that tie 
sooner rather than later...


 This will in many cases lead to 
 fragmentation where it is really possible to avoid it. The best thing we 
 can do at 3) is to move it out, and then unreference it. When the DRI 
 client recognizes through the SAREA that there's a new front buffer, it 
 will immediately release its reference on the old one, but basically, 
 the old front buffer may be hanging around for quite some time (paused 
 dri clients...) and we don't want it to be present in the aperture, even 
 if it's evictable. This won't stop fragmentation in all cases, but will 
 certainly reduce it.

At very least, current DRI/ttm clients could be modified to only use the 
frontbuffer reference in locked regions, and to have some way of getting 
the correct handle for the current frontbuffer at that point.

Longer term, it's easy to imagine DRI clients not touching the front 
buffer independently and not requiring a reference to that buffer...

Keith


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Merging DRI interface changes

2007-10-12 Thread Keith Whitwell

Michel Dänzer wrote:
 On Fri, 2007-10-12 at 10:19 +0100, Keith Whitwell wrote:
 Michel Dänzer wrote:
 On Thu, 2007-10-11 at 18:44 -0400, Kristian Høgsberg wrote:
 On 10/11/07, Keith Whitwell [EMAIL PROTECTED] wrote:

 3) Share buffers with a reference counting scheme.  When a client
 notices the buffer needs a resize, do the resize and adjust refcounts -
 other clients continue with the older buffer.  What happens when a
 client on the older buffer calls swapbuffers -- I'm sure we can figure
 out what the correct behaviour should be.
 3) Sounds like the best solution and it's basically what I'm
 proposing.
 I agree, it looks like this can provide the benefits of shared
 drawable-private renderbuffers (support for cooperative rendering
 schemes, no waste of renderbuffer resources) without compromising the
 general benefits of private renderbuffers.
 Yes, I'm just interested to understand what happens when one of the 
 clients on the old, orphaned buffer calls swapbuffers...  All the 
 buffers should be swapped, right?  Large and small? How does that work?

 If the answer is that we just do the swap on the largest buffer, then 
 you have to wonder what the point of keeping the smaller ones around
 is?
 
 To make 3D drivers nice and simple by not having to deal with fun stuff
 like cliprects? :)

Understood.  I'm thinking about a further simplification - rather than 
keep the old buffers around after the first client requests a resize, 
just free them.  If/when other clients submit commands targeting the 
old-sized buffers, throw those commands away.

 Seriously though, as I understand Kristian's planned scheme, all buffer
 swaps will be done by the DRM, and I presume it'll only take the
 currently registered back renderbuffer into account, so the contents of
 any previous back renderbuffers will be lost. I think that's fine, and
 should address your concerns?

See above -- if the contents of the previous back renderbuffers are 
going to be lost, what is the point in keeping those buffers around?  Or 
doing any further rendering into them?

Keith





-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Merging DRI interface changes

2007-10-11 Thread Keith Whitwell

Brian Paul wrote:
 Kristian Høgsberg wrote:
 Hi,

 I have this branch with DRI interface changes that I've been
 threatening to merge on several occasions:

   http://cgit.freedesktop.org/~krh/mesa/log/?h=dri2

 I've just rebased to todays mesa and it's ready to merge.  Ian
 reviewed the changes a while back gave his ok, and from what we
 discussed at XDS2007, I believe the changes there are compatible with
 the Gallium plans.

 What's been keeping me from merging this is that it breaks the DRI
 interface.  I wanted to make sure that the new interface will work for
 redirected direct rendering and GLXPixmaps and GLXPbuffers, which I
 now know that it does.  The branch above doesn't included these
 changes yet, it still uses the sarea and the old shared, static back
 buffer setup.  This is all isolated to the createNewScreen entry
 point, though, and my plan is to introduce a new createNewScreen entry
 point that enables all the TTM features.  This new entry point can
 co-exist with the old entry point, and a driver should be able to
 support one or the other and probably also both at the same time.

 The AIGLX and libGL loaders will look for the new entry point when
 initializing the driver, if they have a new enough DRI/DRM available.
 If the loader has an old style DRI/DRM available, it will look for the
 old entry point.

 I'll wait a day or so to let people chime in, but if I don't hear any
 stop the press type of comments, I'll merge it tomorrow.
 
 This is basically what's decsribed in the DRI2 wiki at 
 http://wiki.x.org/wiki/DRI2, right?
 
 The first thing that grabs my attention is the fact that front color 
 buffers are allocated by the X server but back/depth/stencil/etc buffers 
 are allocated by the app/DRI client.
 
 If two GLX clients render to the same double-buffered GLX window, each 
 is going to have a different/private back color buffer, right?  That 
 doesn't really obey the GLX spec.  The renderbuffers which compose a GLX 
 drawable should be accessible/shared by any number of separate GLX 
 clients (like an X window is shared by multiple X clients).

I guess I want to know what this really means in practice.

Suppose 2 clients render to the same backbuffer in a race starting at 
time=0, doing something straightforward like (clear, draw, swapbuffers) 
there's nothing in the spec that says to me that they actually have to 
have been rendering to the same surface in memory, because the 
serialization could just be (clear-a, draw-a, swap-a, clear-b, draw-b, 
swap-b) so that potentially only one client's rendering ends up visible.

So I would say that at least between a fullscreen clear and either 
swap-buffers or some appropriate flush (glXWaitGL ??), we can treat the 
rendering operations as atomic and have a lot of flexibility in terms of 
how we schedule actual rendering and whether we actually share a buffer 
or not.Note that swapbuffers is as good as a clear from this 
perspective as it can leave the backbuffer in an undefined state.

I'm not just splitting hairs for no good reason - the ability for the 3d 
driver to know the size of the window it is rendering to while it is 
emitting commands, and to know that it won't change size until it is 
ready for it to is really crucial to building a solid driver.

The trouble with sharing a backbuffer is what to do about the situation 
where two clients end up with different ideas about what size the buffer 
should be.

So, if it is necessary to share backbuffers, then what I'm saying is 
that it's also necessary to dig into the real details of the spec and 
figure out how to avoid having the drivers being forced to change the 
size of their backbuffer halfway through rendering a frame.

I see a few options:
0) The old DRI semantics - buffers change shape whenever they feel like 
it, drivers are buggy, window resizes cause mis-rendered frames.

1) The current truly private backbuffer semantics - clean drivers but 
some deviation from GLX specs - maybe less deviation than we actually think.

2) Alternate semantics where the X server allocates the buffers but 
drivers just throw away frames when they find the buffer has changed 
shape at the end of rendering.  I'm sure this would be nonconformant, at 
any rate it seems nasty.  (i915 swz driver is forced to do this).

3) Share buffers with a reference counting scheme.  When a client 
notices the buffer needs a resize, do the resize and adjust refcounts - 
other clients continue with the older buffer.  What happens when a 
client on the older buffer calls swapbuffers -- I'm sure we can figure 
out what the correct behaviour should be.

etc.

All of these are superficial approaches.  My belief is that if we really 
make an attempt to understand the sharing semantics encoded in the GLX 
spec, and interpret that in the terms of allowable ordering of rendering 
operations of separate clients, a favorable implementation is possible.

Kristian - I apologize that I

Re: Merging DRI interface changes

2007-10-11 Thread Keith Whitwell

Allen Akin wrote:
 On Thu, Oct 11, 2007 at 10:35:28PM +0100, Keith Whitwell wrote:
 | Suppose 2 clients render to the same backbuffer...
 
 The (rare) cases in which I've seen this used, the clients are aware of
 one another, and restrict their rendering to non-overlapping portions of
 the drawable.  A master client is responsible for swap and clear.
 
 I believe the intent of the spec was to allow CPU-bound apps to make use
 of multiple processors.  Rendering to a single drawable, rather than
 multiple drawables, allowed swap to be synchronized.
 
 I recall discussions about ways to coordinate multiple command streams
 so that rendering to overlapping areas of the drawable could be handled
 effectively, but I don't remember any apps that used such methods.

Allen,

Just to clarify, would things look a bit like this:

Master:
clear,
glFlush,
signal slaves somehow

Slave0..n:
wait for signal,
don't clear, just draw triangles
glFlush
signal master

Master:
wait for all slaves
glXSwapBuffers

This is fairly sensible and clearly requires a shared buffer.  It's also 
quite a controlled situation that sidesteps some of the questions about 
what happens when two clients are issuing swapbuffers willy-nilly on the 
same drawable at the same time as the user is frantically resizing it...

Keith

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Merging DRI interface changes

2007-10-11 Thread Keith Whitwell

Kristian Høgsberg wrote:
 On 10/11/07, Keith Whitwell [EMAIL PROTECTED] wrote:
 Brian Paul wrote:
 ...
 If two GLX clients render to the same double-buffered GLX window, each
 is going to have a different/private back color buffer, right?  That
 doesn't really obey the GLX spec.  The renderbuffers which compose a GLX
 drawable should be accessible/shared by any number of separate GLX
 clients (like an X window is shared by multiple X clients).
 I guess I want to know what this really means in practice.

 Suppose 2 clients render to the same backbuffer in a race starting at
 time=0, doing something straightforward like (clear, draw, swapbuffers)
 there's nothing in the spec that says to me that they actually have to
 have been rendering to the same surface in memory, because the
 serialization could just be (clear-a, draw-a, swap-a, clear-b, draw-b,
 swap-b) so that potentially only one client's rendering ends up visible.
 
 I've read the GLX specification a number of times to try to figure
 this out.  It is very vague, but the only way I can make sense of
 multiple clients rendering to the same drawable is if they coordinate
 between them somehow.  Maybe the scenegraph is split between several
 processes: one client draws the backdrop, then passes a token to
 another process which then draws the player characters, and then a
 third draws a head up display, calls glXSwapBuffers() and passes the
 token back to the first process.  Or maybe they render in parallel,
 but to different areas of the drawable, synchronize when they're all
 done and then one does glXSwapBuffers() and they start over on the
 next frame.
 
 ...
 So, if it is necessary to share backbuffers, then what I'm saying is
 that it's also necessary to dig into the real details of the spec and
 figure out how to avoid having the drivers being forced to change the
 size of their backbuffer halfway through rendering a frame.
 
 This is a bigger issue to figure out than the shared buffer one.  I
 know you're looking to reduce the number of changing factors during
 rendering (clip rects, buffer sizes and locations), but the driver
 needs to be able to pick up new buffers in a few more places than just
 swap buffers.  But I think we agree that we can add that polling in a
 couple of places in the driver (before starting a new batch buffer, on
 flush, and maybe other places) and it should work.

Yes, there are a few places, but they are very few.  Basically I think 
it is possible to cut a rendering stream up into chunks which are 
effectively atomic.  Drivers do this all the time anyway - just by 
building a dma buffer that is then submitted atomically to hardware for 
processing.

It isn't too hard to figure out where the boundaries of these regions 
are - if we think about a driver with effectively infinite dma space, 
then such a driver only flushes when required to satisfy requirements 
placed on it by the spec.

I also believe that the only sane time to check the size of the 
destination drawable is when the driver is *entering* such an atomic 
region (let's call it a scene).

Swapbuffers terminates a scene, it doesn't really start the next one - 
that doesn't happen until actual rendering starts.  I would even say 
that fullscreen clears don't start a scene, but that's another story...

The things that terminate a scene are:
- swapbuffers
- readpixels and similar
- maybe glFlush() - though I'm sometimes naughty and no-op it for 
backbuffer rendering.

Basically any API-generated event that implies a flush.  Internally 
generated events, like running out of some resource and having to fire 
buffers to recover generally don't count.




 I see a few options:
 0) The old DRI semantics - buffers change shape whenever they feel 
 like
 it, drivers are buggy, window resizes cause mis-rendered frames.

 1) The current truly private backbuffer semantics - clean drivers but
 some deviation from GLX specs - maybe less deviation than we actually think.

 2) Alternate semantics where the X server allocates the buffers but
 drivers just throw away frames when they find the buffer has changed
 shape at the end of rendering.  I'm sure this would be nonconformant, at
 any rate it seems nasty.  (i915 swz driver is forced to do this).

 3) Share buffers with a reference counting scheme.  When a client
 notices the buffer needs a resize, do the resize and adjust refcounts -
 other clients continue with the older buffer.  What happens when a
 client on the older buffer calls swapbuffers -- I'm sure we can figure
 out what the correct behaviour should be.
 
 3) Sounds like the best solution and it's basically what I'm
 proposing.  For the first implementation (pre-gallium), I'm looking to
 just reuse the existing getDrawableInfo polling for detecting whether
 new buffers are available.  It won't be more or less broken than the
 current SAREA scheme.  When gallium starts to land, we can fine-tune
 the polling

Re: [patch] post superioctl inteface removal.

2007-10-09 Thread Keith Whitwell

Dave Airlie wrote:
 Dave,
 As mentioned previously to Eric, I think we should keep the single
 buffer validate interface with the exception that the hint
 DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin
 interface. We can perhaps rename it to drmBOSetStatus or something more
 suitable.

 This will get rid of the user-space unfenced list access (which I
 believe was the main motivation behind the set pin interface?) while
 keeping the currently heavily used (at least in Poulsbo) functionality
 to move out NO_EVICT scanout buffers to local memory before unpinning
 them, (to avoid VRAM and TT fragmentation, as DRI clients may still
 reference those buffers, so they won't get destroyed before a new one is
 allocated).

 It would also allow us to specify where we want to pin buffers. If we
 remove the memory flag specification from drmBOCreate there's no other
 way to do that, except running the buffer through a superioctl which
 isn't very nice.

 Also it would make it much easier to unbreak i915 zone rendering and
 derived work.

 If we can agree on this, I'll come up with a patch.
 
 I'm quite happy to have this type of interface I can definitely see its 
 uses.. we also need to investigate some sort of temporary NO_MOVE like 
 interface (the NO_MOVE until next fencing...) in order to avoid 
 relocations, but it might be possible to make this driver specific..
 
 Keith P also had an idea for relocation avoidance in the simple case which 
 I've allowed for in my interface, we could use the 4th uint32_t in the 
 relocation to pass in the value we've already written and only relocate it 
 if the buffers location changes, so after doing one superioctl, the 
 validated offsets would be passed back to userspace and used by it and we 
 only have to relocate future buffers if the buffers move..

Theoretically the kernel could keep the relocation lists for each buffer 
hanging around after use and do this automatically if a buffer is 
reused, and the buffers that its relocations point to have been moved.

That would be a good approach for one sort of buffer reuse, ie 
persistent state object buffers that are reused over multiple frames but 
contain references to other buffers.

Note that it only makes sense to reuse relocations in situations where 
those relocations target a small number of buffers - probably other 
command buffers or buffers containing state objects which themselves 
make no further reference to other buffers.

Trying to go beyond that, eg to reuse buffers of state objects that 
contain references to texture images, can lead to a major waste of 
resources.

If you think about a situation with a buffer of 50 texture state 
objects, each referencing 4 texture images, and you just want to reuse 
one of those state objects -- you will emit a relocation to that state 
buffer, which will need to be validated and then should recursively 
require all 200 texture images to be validated, even if you only needed 
access to 4 of them...

The pre-validate/no-move/whatever thing is a useful optimization, but it 
only makes sense up to a certain level - a handful of command, indirect 
state and/or vertex buffers is pretty much it.

Keith

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Initial 915 superioctl patch.

2007-10-08 Thread Keith Whitwell

Neither 42 nor 256 are very good - the number needs to be dynamic.  Think about 
situations where an app has eg. one glyph per texture and is doing font 
rendering...  Or any reasonably complex game might use 256 textures in a 
frame.  

Sorry for topposting -- webmail.

Keith
- Original Message 
From: Jesse Barnes [EMAIL PROTECTED]
To: Dave Airlie [EMAIL PROTECTED]
Cc: dri-devel@lists.sourceforge.net
Sent: Monday, October 8, 2007 6:04:42 PM
Subject: Re: Initial 915 superioctl patch.



I don't know if 42 is better than 256... do we have any measurements 
that would help us pick a good number or that would tell us we need to 
make it a runtime option?  Or maybe just part of the argument structure 
that's passed in?

Jesse

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Initial 915 superioctl patch.

2007-10-08 Thread Keith Whitwell

Dave Airlie wrote:
 
 On Monday, October 8, 2007 10:13 am Keith Whitwell wrote:
 Neither 42 nor 256 are very good - the number needs to be dynamic.
 Think about situations where an app has eg. one glyph per texture and
 is doing font rendering...  Or any reasonably complex game might use
 256 textures in a frame.

 So maybe the buffer count should be part of the execbuffer request
 object?  Or does it have to be a separate settable parameter?
 
 I would think the kernel needs to limit this in some way... as otherwise 
 we are trusting a userspace number and allocating memory according to it...
 
 So I'll make it dynamic but I'll have to add a kernel limit..
 
 keithw: btws poulsbo uses 256 I think also..

Yes but I suspect we'll need to increase or make it dynamic before we're 
done.

As with most hard limits, you can work around it by flushing 
prematurely, but there is a cost to that, one way or another.

Keith

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 Design Wiki Page

2007-10-04 Thread Keith Whitwell

Kristian Høgsberg wrote:
 Hi,
 
 I wrote up the DRI2 design on a wiki page here:
 
 http://wiki.x.org/wiki/DRI2
 
 It's the result of the discussions we had during my redirected
 rendering talk and several follow up discussions with Keith Whitwell
 and Michel Daenzer.  Relative to the design I presented, the
 significant change is that we now track clip rects and buffer
 attachments in the drm as part of the drm_drawable_t.  We always have
 private back buffers and swap buffers is implemented in the drm.  All
 this taken together (also with the super-ioctl) means that we don't
 need an SAREA or a drm or drawable lock.
 
 There is an issue with the design, though, related to how and when the
 DRI driver discovers that the front buffer has changed (typically
 resizing).  The problem is described in detail on the page, but in
 short, the problem is that we were hoping to only check for this at
 glXSwapBuffers() time, but we need to discover this earlier in many
 cases.  Keith W alluded to a beginning of frame synchronization
 point in a previous mail, which may work, but I'm not sure of the
 details there.

I added a couple of comments, but I'm not sure about the issues around 
contexts sharing a drawable/backbuffer and the effects of glXSwapBuffers 
in that case.  Brian may be able to help with this a little.

Keith


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 Design Wiki Page

2007-10-04 Thread Keith Whitwell

Keith Whitwell wrote:
 Kristian Høgsberg wrote:
 On 10/4/07, Keith Packard [EMAIL PROTECTED] wrote:
 On Thu, 2007-10-04 at 01:27 -0400, Kristian Høgsberg wrote:

 There is an issue with the design, though, related to how and when the
 DRI driver discovers that the front buffer has changed (typically
 resizing).
 Why would the rendering application even need to know the size of the
 front buffer? The swap should effectively be under the control of the
 front buffer owner, not the rendering application.

 Ok, I phrased that wrong: what the DRI driver needs to look out for is
 when size of the rendering buffers change.  For a redirected window,
 this does involve resizing the front buffer, but that's not the case
 for a non-redirected window.  The important part, though, is that the
 drawable size changes and before submitting rendering, the DRI driver
 has to allocate new private backbuffers that are big enough to hold
 the contents.

 As far as figuring out how big to make the rendering buffers, that's
 outside the scope of DRM in my book. The GLX interface can watch for
 ConfigureNotify events on the associated window and resize the back
 buffers as appropriate.

 I guess you're proposing libGL should transparently listen for
 ConfigureNotify events?  I don't see how that can work, there is no
 guarantee that an OpenGL application handles events.  For example,
 glxgears without an event loop, just rendering.  If the rendering
 extends outside the window bounds and you increase the window size,
 the next frame should include those parts that were clipped by the
 window in previous frames.  X events aren't reliable for this kind of
 notification.

 And regardless, the issue isn't so much how to get the resize
 notification from the X server to the direct rendering client, but
 rather that the Gallium design doesn't expect these kinds of
 interruptions while rendering a frame.  So while libGL (or AIGLX) may
 be able to notice that the window size changed, what I'm missing is a
 mechanism to ask the DRI driver to reallocate its back buffers.
 
 I think basically we just need a tweak to what we're already doing for 
 private backbuffers to cope with the periodic rendering case you've 
 highlighted.  So basically checking before the first draw and again 
 before swapbuffers, rather than just before swapbuffers.
 
 This doesn't address the question about contexts in potentially 
 different processes sharing a backbuffer, but I'm not 100% convinced its 
 possible, and if it is possible under glx, I'm not 100% convinced that 
 its a sensible thing to support anyway...

Basically what I'm saying above is that 1) I haven't had a chance to dig 
into the shared-context issue, 2) in my experience GL and GLX specs 
provide a good amount of wiggle room to allow for a variety of 
implementation strategies, and 3) we should be careful not to jump to an 
unfavourable interpretation of the spec that ties us into a non-optimal 
architecture.

I don't think we're looking at a particularly unique or unusual strategy 
- quite a few GL stacks end up with private backbuffers it seems, so 
these are problems that have all been faced and solved before.

Keith


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 Design Wiki Page

2007-10-04 Thread Keith Whitwell

Kristian Høgsberg wrote:
 On 10/4/07, Keith Packard [EMAIL PROTECTED] wrote:
 On Thu, 2007-10-04 at 01:27 -0400, Kristian Høgsberg wrote:

 There is an issue with the design, though, related to how and when the
 DRI driver discovers that the front buffer has changed (typically
 resizing).
 Why would the rendering application even need to know the size of the
 front buffer? The swap should effectively be under the control of the
 front buffer owner, not the rendering application.
 
 Ok, I phrased that wrong: what the DRI driver needs to look out for is
 when size of the rendering buffers change.  For a redirected window,
 this does involve resizing the front buffer, but that's not the case
 for a non-redirected window.  The important part, though, is that the
 drawable size changes and before submitting rendering, the DRI driver
 has to allocate new private backbuffers that are big enough to hold
 the contents.
 
 As far as figuring out how big to make the rendering buffers, that's
 outside the scope of DRM in my book. The GLX interface can watch for
 ConfigureNotify events on the associated window and resize the back
 buffers as appropriate.
 
 I guess you're proposing libGL should transparently listen for
 ConfigureNotify events?  I don't see how that can work, there is no
 guarantee that an OpenGL application handles events.  For example,
 glxgears without an event loop, just rendering.  If the rendering
 extends outside the window bounds and you increase the window size,
 the next frame should include those parts that were clipped by the
 window in previous frames.  X events aren't reliable for this kind of
 notification.
 
 And regardless, the issue isn't so much how to get the resize
 notification from the X server to the direct rendering client, but
 rather that the Gallium design doesn't expect these kinds of
 interruptions while rendering a frame.  So while libGL (or AIGLX) may
 be able to notice that the window size changed, what I'm missing is a
 mechanism to ask the DRI driver to reallocate its back buffers.

I think basically we just need a tweak to what we're already doing for 
private backbuffers to cope with the periodic rendering case you've 
highlighted.  So basically checking before the first draw and again 
before swapbuffers, rather than just before swapbuffers.

This doesn't address the question about contexts in potentially 
different processes sharing a backbuffer, but I'm not 100% convinced its 
possible, and if it is possible under glx, I'm not 100% convinced that 
its a sensible thing to support anyway...

Keith


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now  http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: drm: Branch 'master'

2007-09-26 Thread Keith Whitwell

Alan Hourihane wrote:
  linux-core/drm_drv.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 New commits:
 diff-tree 6671ad1917698b6174a1af314b63b3800d75248c (from 
 03c47f1420bf17a1e0f2b86be500656ae5a4c95b)
 Author: Alan Hourihane [EMAIL PROTECTED]
 Date:   Wed Sep 26 15:38:54 2007 +0100
 
 don't copy back if an error was returned.
 
 diff --git a/linux-core/drm_drv.c b/linux-core/drm_drv.c
 index cedb6d5..8513a28 100644
 --- a/linux-core/drm_drv.c
 +++ b/linux-core/drm_drv.c
 @@ -645,7 +645,7 @@ long drm_unlocked_ioctl(struct file *fil
   retcode = func(dev, kdata, file_priv);
   }
  
 - if (cmd  IOC_OUT) {
 + if ((retcode == 0)  cmd  IOC_OUT) {

Hmm, brackets around the == but not around the  ??

Could you humour me and change that to something like:

if (retcode == 0  (cmd  IOC_OUT)) {


Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

XDS: Intel i965 docs

2007-09-19 Thread Keith Whitwell

Just FYI, one of the things that came up at xds is that Intel is now 
making a scrubbed version of the i965 docs available under NDA.  Dave 
Airlie has been working with them at redhat, for instance.

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Vblanks, CRTCs and GLX, oh my!

2007-09-18 Thread Keith Whitwell

Jesse Barnes wrote:
 Both the generic DRM vblank-rework and Intel specific pipe/plane 
 swapping have uncovered some vblank related problems which we discussed 
 at XDS last week.  Unfortunately, no matter what we do (including 
 the do nothing option), some applications will break some of the time 
 in the new world order.
 
 Basically we have a few vblank related bits of code:
   1) DRM_IOCTL_WAIT_VBLANK - core DRM vblank wait ioctl
   2) driver interrupt code - increments appropriate vblank counter
   3) DRM_I915_VBLANK_SWAP - Intel specific scheduled swap ioctl
   4) SAREA private data - used for tracking which gfx plane to swap
   5) glX*VideoSyncSGI - GL interfaces for sync'ing to vblank events
 
 As it stands, DRM_IOCTL_WAIT_VBLANK is downright broken in the new world 
 of dyanmically controlled outputs and CRTCs (at least for i915 and 
 radeon):  a client trying to sync against the second CRTC that doesn't 
 pass _DRM_VBLANK_SECONDARY will only work if one CRTC is enabled, due 
 to the way current interrupt handlers increment the respective vblank 
 counters (i.e. they increment the correct counter if both CRTCs are 
 generating events, but only the primary counter if only one CRTC vblank 
 interrupt is enabled).
 
 The Intel specific DRM_I915_VBLANK_SWAP is a really nice interface, and 
 is the only reliable way to get tear free vblank swap on a loaded 
 system.  However, what it really cares about is display planes (in the 
 Intel sense), so it uses the _DRM_VBLANK_SECONDARY flag to indicate 
 whether it wants to flip plane A or B.  Whether or not to pass the 
 _DRM_VBLANK_SECONDARY flag is determined by DRI code based on the SAREA 
 private data that describes how much of a given client's window is 
 visible on either pipe.  This should work fine as of last week's mods 
 and only the DDX and DRM code have to be aware of potential pipe-plane 
 swapping due to hardware limitations.
 
 The vblank-rework branch of the DRM tree tries to address (1) and (2) by 
 splitting the logic for handling CRTCs and their associated vblank 
 interrupts into discrete paths, but this defeats the original purpose 
 of the driver interrupt code that tries to fall back to a single 
 counter, which is due to limitations in (5), namely that the 
 glX*VideoSyncSGI APIs can only handle a single pipe.
 
 So, what to do?  One way of making the glX*VideoSyncSGI interfaces 
 behave more or less as expected would be to make them more like 
 DRM_I915_VBLANK_SWAP internally, i.e. using SAREA values to determine 
 which pipe needs to be sync'd against by passing in the display plane 
 the client is most tied to (this would imply making the Intel specific 
 SAREA plane info more generic), letting the DRM take care of the rest.

If the SGI glx extensions aren't matching the hardware capabilities, I 
think it's appropriate to start talking about new extensions exposing 
behaviour we can support...

It might be worth taking a look over the fence at the wgl world and see 
if there's anything useful there that might be adapted.

Keit

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

ttm --- dealing with (more) limited memory pools

2007-09-17 Thread Keith Whitwell

I've got a couple of things that are bothering me if we're looking at 
finalizing the TTM interface in the near term.

Specifically I'm concerned that we don't have a recoverable way to deal 
with out-of-memory situations.

Consider a driver that tries to submit whole frames of q3, which is 
running on say a 32mb card.

There's nothing to stop the driver specifying textures in excess of what 
is available to the memory manager.  If the driver does do this, there's 
no feedback from the kernel that this is a band idea until after it's 
done.  Also, all the geometry is now gone, so it's too late to 
restructure the command stream or even fall back to software.

Note this is a different situation to using 8 huge textures to draw a 
single triangle, where nothing can be done to help.  In the case above, 
the scene could be split and rendered on hardware, although at the 
expense of texture thrashing.

We've benefited from the flexibilty with IGPs to avoid this so far, but 
we do want to cope with VRAM and it seems like we are currently missing 
some of the necessary mechanisms...

So the issues are:

- how does the driver know ahead of time it is running out of texture 
space?

- if the answer to the above is it doesn't, then how do we rescue 
submitted command streams that exceed texture space?

- relatedly, on cards with texture-from-vram and texture-from-agp, how 
does the driver know which pool to use for a particular texture?



At worst, I can imagine something like the kernel pushing out to 
userspace a size for each pool which is guarenteed to be available for 
validated buffers.

Eg, on a 32 mb card, we could say that there is a maximum no-evict space 
of 8mb, meaning that at all times there is at least 24 mb available for 
validated buffers.  There may be more than that.  The userspace driver 
would be responsible for ensuring that all the buffers it wants to 
validate to that pool do not exceed 24mb (given some alignment 
constraints???).  When it approaches that limit, it can either switch to 
other pools or flush the command stream.

If some of the 8mb is free, it can be used by the memory manager to 
avoid evicts.

In the worst case, it just means that the userspace driver flushes more 
often than it strictly has to.  It can even try and exceed the 24mb if 
it wants to, but has to live with the possibility of the memory manager 
saying 'no'.  It still has access to the full amount of memory on the 
card not taken by no-evict buffers.


So summarizing:

- Enforce a limit on no-evict buffers.  Keep these to a contigous 
region of the address space (XXX: note this makes pageflipping private 
backbuffers more complex).

- Advertize the size of the remaining space.

- Drivers monitor the total size of buffers referenced by relocations, 
and flush before it reaches the available space in any pool.

- Drivers may try to reference more as long as they are prepared for 
failure.

- The memory manager uses any extra space to avoid evicts.

This seems like it can be implemented in the time available with minimal 
kernel changes.  It also seems like it will probably work, and pushes 
most of the responsibility into the userspace driver, and allows it to 
make decisions as the stream is being built rather than trying to fix it 
up afterwards...

Keith

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

TTM BOF topics

2007-08-28 Thread Keith Whitwell

Looks like we've got a slot at XDS to talk about all our adventures with 
buffer management and plans for the future.  It might make the session 
more productive if we do a little groundwork first...

If you've been working with TTM and have things you'd like to talk 
about, please reply to this email and let's try and knock out the easy 
stuff ahead of time...

Keith

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRM enhancements document

2007-08-02 Thread Keith Whitwell

Michel Dänzer wrote:
 On Thu, 2007-08-02 at 17:44 +0200, Jerome Glisse wrote:
 Btw i think that some GPU can wait on vblank using cmd ie you
 don't need to ask the card to emit irq you just insert a cmd in
 stream which stall further cmd execution until vblank happen,
 this might be good for power consumption. 
 
 It's generally a bad idea because it prevents the GPU from doing
 anything else that could be done before vertical blank.

This is true on cards with a single command stream - if you had 
per-context ringbuffers and a hardware scheduler, it might be better.

Unfortunately, you still end up forcing the cliprects not to change, 
unless you also have some hardware mechanism for that, which I think is 
pretty rare nowdays.

Hmm.  Maybe you could use the frontbuffer alpha bits as a window id. 
You'd still need to either lock the window position, or find some way of 
telling the hardware about window position changes, or... something else...

Anyway, it gets complicated...

Keith


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Merging DRI changes

2007-06-14 Thread Keith Whitwell

Kristian Høgsberg wrote:
 Hi,
 
 I've finished the changes to the DRI interface that I've been talking
 about for a while (see #5714). Ian had a look at the DRI driver side
 of things, and ACK'ed those changes.  I've done the X server changes
 now plus a couple of GLX module cleanups, and I think it's all ready
 to push:
 
   http://gitweb.freedesktop.org/?p=users/krh/mesa.git;a=shortlog;h=dri2 and
   http://gitweb.freedesktop.org/?p=users/krh/xserver.git;a=shortlog;h=dri2
 
 One thing that's still missing is Alan H's changes to how DRI/DDX maps
 the front buffer.  While the changes above break the DRI interface,
 they only require an X server and a Mesa update. Alans patches change
 the device private shared between the DDX and DRI driver and thus
 requires updating every DRI capable DDX driver in a non-compatible
 way.

Kristian,

Just letting you know Alan's on holidays this week, back on Monday.

Keith


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: drm-ttm-cleanup-branch

2007-05-17 Thread Keith Whitwell

Dave Airlie wrote:
 On 5/9/07, Thomas Hellström [EMAIL PROTECTED] wrote:
 Dave Airlie wrote:
 I'll try it out as soon as there is time.
 I've just tested glxgears and a few mesa tests on it and it seems to
 be working fine

 We should probably think about pulling this over into the DRM sooner
 rather than later, there are also some changes to the DDX
 i830_driver.c compat code to deal with...


 Yup. I've attached a patch (against the cleanup branch) with things I
 think may be needed.

 1) 64-bit reordering. 64-bit scalars, structs and unions should probably
 be 64-bit aligned in parent structs. I had to insert padding in two
 cases, but this probably needs to be double-checked.
 2) A magic member in the init ioctl. Checking this allows for verbose
 and friendly failure of code that uses the old interface.
 3) Init major / minor versioning of the memory manager interface in case
 we need changes in the future.
 4) expand_pads are 64-bit following Jesse Barnes recommendations.
 5) The info_req carries a fence class for validation for a particular
 command submission mechanism.
 6) The info_rep arg carries tile_strides and tile_info.
 The argument tile_strides is ((actual_tile_stride)  16) |
 (desired_tile_stride))
 
 Any reason you don't just separate actual_tile_stride and
 desired_tile_stride to 2xunsigned int? not sure why merging them
 really gives us anything...
 
 The argument tile_info is driver-specific. (Could be tile width,
 x-major, y-major etc.)

 Finally, should we perhaps allow for 64-bit buffer object flags / mask
 at this point?

 
 Possibly, the rest all seems like good ideas, I know we hit the 64-bit
 alignment on nouveau so good to get it fixed early
 
 I haven't done any user-space or kernel coding for this yet.
 Just want to know what you think.
 
 Well I'll be offline for a few weeks so I'll get the odd chance to
 read mail but no development...

If kernel-space relocations are on the agenda, I should flag up one 
issue that we are currently pretending doesn't really exist, namely that 
not all relocations apply to the main dma/batch buffer.

More precisely, state structures stored for use and reuse outside of the 
command stream can contain pointers to things like draw surfaces and 
textures.

Having pointers in those structs limits their ability to be reused as 
intended, but there's not much to do about that.  The important thing to 
note is that they *do* need to fixed up with relocation information at 
the same time as the batchbuffer.  Or in other words, that while the 
batch buffer may be special, it is not unique - there are a (small) 
number of buffers that will require the same treatment.

Keith


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: R200 minor cleanups

2007-05-15 Thread Keith Whitwell

Oliver McFadden wrote:
 My thoughts are, we should unify the really common stuff... but I don't think
 it's possible to unify r200_tex.c and r300_tex.c. The hardware is different, 
 and
 the file would end up with an #ifdef on every 3rd line; it doesn't make sense
 here.
 
 Just for really common code it does.
 
 I don't know what is going to happen with TTM. Maybe we should hack the r300
 driver for TTM (and someone else can do R200 and R128 (radeon) if they like) 
 or
 maybe we should start a new DRI driver completely from scratch, with TTM and
 good state handling in mind from the beginning. Then we just take the code we
 need from R300 and merge it into the new DRI driver.
 
 Regarding indenting, I indented the driver with the Linux kernel style because
 that is what matched most (but not all) of the code. The indenting was a 
 little
 inconsistent. If you like, fell free indent the R200 or R128 (radeon) code, 
 too.
 
 I guess for TTM we'll have to wait and see what happens...

Just letting you know I've been doing a bit of thinking about the Mesa 
driver model lately, and I think there's room to pull a lot of stuff up 
and out of the drivers.

Regarding textures - almost all of the texture handling for *all* of the 
drivers could effectively be handled by the miptree mechanism in the 
i915tex driver - basically everything could be made device-independent, 
but with the driver specifying how to lay mipmaps out within rectangular 
regions that are then managed in a device-independent way.  The driver 
also provides some utilities like CopyBlit, etc, and helpers for 
choosing formats.

I think there are a bunch of similar pieces of functionality that have 
built up in the drivers, simply because we aren't providing the right 
level of support in core mesa.

Keith


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: R300 cleanup questions

2007-05-15 Thread Keith Whitwell

Keith Whitwell wrote:
 Oliver McFadden wrote:
 I'd like some input on the VBO stuff in r300. In r300_context.h we have the
 following.

 /* KW: Disable this code.  Driver should hook into vbo module
  * directly, see i965 driver for example.
  */
 /* #define RADEON_VTXFMT_A */
 #ifdef RADEON_VTXFMT_A
 #define HW_VBOS
 #endif

 So the VTXFMT (radeon_vtxfmt_a.c) code is disabled anyway. This also disables
 hardware VBOs. I guess this has been done since the new VBO branch was 
 merged.

 So, the question is, should this dead code be removed? I think all drivers 
 are
 (or should be) moving to the new VBO code anyway.

 I've already made a patch for this, but I'm not committing until I get the 
 okay
 from a few people.
 
 Yes, the old code should go.  I guess there might be some starting 
 points in there for beginning the vbo work, that's about the only reason 
 to keep it.

Hmm, I just took a look through the r300 code, and was surprised to see 
myself listed as the author of several of the files??  I'm pretty sure I 
haven't done any work on that driver...

I think I'd prefer to have a line that says based on xxx by Keith 
Whitwell, or even just remove my name from the r300_* files and give 
credit instead to the people who've really been working on that code...

Keith

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: R300 cleanup questions

2007-05-09 Thread Keith Whitwell

Oliver McFadden wrote:
 I'd like some input on the VBO stuff in r300. In r300_context.h we have the
 following.
 
 /* KW: Disable this code.  Driver should hook into vbo module
  * directly, see i965 driver for example.
  */
 /* #define RADEON_VTXFMT_A */
 #ifdef RADEON_VTXFMT_A
 #define HW_VBOS
 #endif
 
 So the VTXFMT (radeon_vtxfmt_a.c) code is disabled anyway. This also disables
 hardware VBOs. I guess this has been done since the new VBO branch was merged.
 
 So, the question is, should this dead code be removed? I think all drivers 
 are
 (or should be) moving to the new VBO code anyway.
 
 I've already made a patch for this, but I'm not committing until I get the 
 okay
 from a few people.

Yes, the old code should go.  I guess there might be some starting 
points in there for beginning the vbo work, that's about the only reason 
to keep it.

Keith

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Whitwell

Keith Packard wrote:

 OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
 buffers in the correct order might not be the best one either. It will 
 certainly mean some CPU overhead and what if we have to do the same with 
 buffer validation? (Yes for some operations with thousands and thousands 
 of relocations, the user space validation might need to stay).
 
 I do not want to do relocations in user space. I don't see why doing
 thousands of these requires moving this operation out of the kernel.

Agreed.  The original conception for this was to have valdiation plus 
relocations be a single operation, and by implication in the kernel. 
Although the code as it stands doesn't do this, I think that should 
still be the approach.

The issue with thousands of relocations from my point of view isn't a 
problem - that's just a matter of getting appropriate data structures in 
place.

Where things get a bit more interesting is with hardware where you are 
required to submit a whole scene's worth of rendering before the 
hardware will kick off, and with the expectation that the texture 
placement will remain unchanged throughout the scene.  This is a very 
easy way to hit any upper limit on texture memory - the agp aperture 
size in the case of integrated chipsets.

That's a special case of a the general problem of what do you do when a 
client submits any validation list that can't be satisfied.  Failing to 
render isn't really an option, either the client or the memory manager 
has to either prevent it happening in the first place or have some 
mechanism for chopping up the dma buffer into segments which are 
satisfiable...  Neither of which I can see an absolutely reliable way to 
do.

I think that any memory manager we can propose will have flaws of some 
sort - either it is prone to failures that aren't really allowed by the 
API, is excessively complex or somewhat pessimistic.  We've chosen a 
design that is simple, optimistic, but can potentially say no 
unexpectedly.  It would then be up to the client to somehow pick up the 
pieces  potentially submit a smaller list.  So far we just haven't 
touched on how that might work.

The way to get around this is to mandate that hardware supports paged 
virtual memory...  But that seems to be a difficult trick.

Keith

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Lockups with Googleearth

2007-03-06 Thread Keith Whitwell

Adam K Kirchhoff wrote:
 Michel Dänzer wrote:
 On Tue, 2007-03-06 at 04:32 -0500, Adam K Kirchhoff wrote:
   
 commit 09e4df2c65c1bca0d04c6ffd076ea7808e61c4ae causes the lockup..  If 
 I'm reading the git log properly this is right after the merge from 
 vbo-0.2.  However, commit 47d463e954efcd15d20ab2c96a455aa16ddffdcc also 
 causes the lockup, and this is right before the merge from vbo-0.2.
 
 No, that's still on vbo-0.2. The last commit on master before the merge
 is 325196f548f8e46aa8fcc7b030e81ba939e7f6b7. I really recommend gitk. :)


   
 
 Sorry about that.  I turned back to the log after browsing through gitk 
 last night well past after I should have been asleep :-)
 
 Anyway, you're suspicion was correct, this problem did not exist prior 
 to the merge of the vbo-0.2 branch, but did start immediately after the 
 merge.
 
 Does this need to be narrowed down further on the vbo-0.2 branch?

You can try, but that branch was a wholesale replacement of some 
existing functionality, so you may just end up at the commit where 
things switched over.  It may be better than that, so possibly worth trying.

It may make sense to try and narrow things down in the driver to a 
certain operation or set of operations, ie the commit history may be too 
coarse and you just have to attack the bug from first principles.

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] R300 early Z cleanup

2007-02-26 Thread Keith Whitwell

Jerome Glisse wrote:
 On 2/26/07, Roland Scheidegger [EMAIL PROTECTED] wrote:
 Christoph Brill wrote:
 Attached is a mini-patch to add the address of early Z to r300_reg.h
 and use it. Jerome Glisse helped me with this patch. Thanks. :-)
 Not really related directly to the patch itself, but it seems to me that
 the conditions when to enable early-z are a bit wrong. First, I'd think
 you need to disable early-z when writing to the depth output (or use
 texkill) in fragment programs. Second, why disable early-z specifically
 if the depth function is gl_never? Is that some workaround because
 stencil updates don't work otherwise (in that case should certainly
 rather check for that) or something similar?

 Roland
 
 Yes we need to disable early z when the fragment program write to
 z buffer, so far there isn't real support for depth writing in the driver
 (it's in the todo list). I fail to see why we should disable early z when
 there is texkill instruction (and stencil disabled), if we disable early
 z then if the fragment doesn't pass the test in the early pass it
 won't after texkill so will be killed anyway (and better kill early then
 at the end).

If you don't disable early z, you can end up writing values to the depth 
buffer for fragments that are later killed.

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [PATCH] R300 early Z cleanup

2007-02-26 Thread Keith Whitwell

Jerome Glisse wrote:
 On 2/26/07, Keith Whitwell [EMAIL PROTECTED] wrote:
 Jerome Glisse wrote:
  On 2/26/07, Roland Scheidegger [EMAIL PROTECTED] wrote:
  Christoph Brill wrote:
  Attached is a mini-patch to add the address of early Z to r300_reg.h
  and use it. Jerome Glisse helped me with this patch. Thanks. :-)
  Not really related directly to the patch itself, but it seems to me 
 that
  the conditions when to enable early-z are a bit wrong. First, I'd 
 think
  you need to disable early-z when writing to the depth output (or use
  texkill) in fragment programs. Second, why disable early-z 
 specifically
  if the depth function is gl_never? Is that some workaround because
  stencil updates don't work otherwise (in that case should certainly
  rather check for that) or something similar?
 
  Roland
 
  Yes we need to disable early z when the fragment program write to
  z buffer, so far there isn't real support for depth writing in the 
 driver
  (it's in the todo list). I fail to see why we should disable early z 
 when
  there is texkill instruction (and stencil disabled), if we disable 
 early
  z then if the fragment doesn't pass the test in the early pass it
  won't after texkill so will be killed anyway (and better kill early 
 then
  at the end).

 If you don't disable early z, you can end up writing values to the depth
 buffer for fragments that are later killed.

 Keith
 
 Doesn't early z only discard fragment that fail z test and doesn't write
 z value, differing the write after fragment operation ?

I guess it depends on the hardware - at least some do both the test and 
write early.  You'd have to test somehow.

If it does do the writeback early, you need to also look at disabling 
when alphatest is enabled.

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: mesa: Branch 'master'

2007-01-17 Thread Keith Whitwell

Stephane Marchesin wrote:
 Keith Whitwell wrote:
  configs/linux-dri-debug |   16 
  1 files changed, 16 insertions(+)

 New commits:
 diff-tree 3bfbe63806cee1c44da2625daf069b719f2a6097 (from 
 747c9129c0b592941b14c290ff3d8ab22ad66acb)
 Author: Keith Whitwell [EMAIL PROTECTED]
 Date:   Wed Jan 17 08:44:13 2007 +

 New debug config for linux-dri
   
 Also, isn't it time that we use -O2 by default for linux-dri ? It brings 
 quite a bit of performance increase. Failing that, can we add -O2 
 somewhere in the nouveau makefile (so that only our code gets built with 
 it) ?

I don't have a problem with doing this in configs/linux-dri, providing 
the '-fno-strict-aliasing' flag is set in there somewhere too.

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] VBO broken by changes in mesa

2006-11-21 Thread Keith Whitwell

Rune Petersen wrote:
 Keith Whitwell wrote:
 I've fixed some typo's in this code - with luck this should be solved now?
 
 sorry no...
 
 If you can't find in the next day, would you mind disabling it for 6.5.2
 release?

OK, I've made some progress on the i915 at least, can you retry over there?

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: i915: Xserver crash and restart fails

2006-11-21 Thread Keith Whitwell

Tino Keitel wrote:
 On Fri, Nov 17, 2006 at 22:12:09 +0100, Tino Keitel wrote:
 Hi folks,

 I use the TV application MythTV that uses OpenGL to draw its GUI. Since
 a while I can crash my Xserver very easy just by switching to the
 workspace that shows the MythTV GUI. A restart of the Xserver fails. I
 
 If this helps: I can stop the display manager, suspend to disk using
 suspend2, resume, and restart X. It seems to work again after this.

Which version of the i915 driver are you using?  Is it i915tex?  If not, 
please upgrade to i915tex and retry.

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] VBO broken by changes in mesa

2006-11-18 Thread Keith Whitwell

Rune Petersen wrote:
 Hi,
 
 A patch for making sure VBO's are mapped breaks r300:
 
 http://marc.theaimsgroup.com/?l=mesa3d-cvsm=116364446305536w=2
 
 It would appear we just need to add _ae_(un)map_vbos() the right
 places in radeon_vtxfmt_a.c.

Rune, my expectation was that the change wouldn't break drivers, but 
that doing the _ae_map/unmap externally would reduce the performance 
impact of the change.

I can't debug r300 unfortunately, so if adding the explict map/unmap 
helps, go ahead and do so, but could you also post me stacktraces of the 
crash (I assume its a crash?) so I can figure out what the underlying 
problem might be?

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-11-02 Thread Keith Whitwell

Rune Petersen wrote:
 Keith Whitwell wrote:
 Rune Petersen wrote:
 Keith Whitwell wrote:
 Roland Scheidegger wrote:
 Keith Whitwell wrote:
 I think Rune is rather refering to the fact that you can't change (not
 with legal means at least) the constant you got with
 _mesa_add_unnamed_constant.
 Ah right.  I missed that.

 I think there exist at least 2 solutions for that. The clean way would
 probably be to add some more INTERNAL_STATE (like i965 driver uses) so
 you use _mesa_add_state_reference instead, in this case mesa's shader
 code would need to update program parameter based on the drawable
 information - I'm not sure if accessing a driver's drawable
 information there would get messy). The easier solution would probably
 be to just directly manipulate the ParameterValues entry associated
 with the constant you added, easy though it might be considered
 somewhat hackish. Just don't forget you not only have to update the
 constant within r300UpdateWindow (if the currently bound fp requires
 it), but also when the active fp is switched to another one (and make
 sure that a parameter upload is actually triggered if it not already
 is upon drawable changes).
 I think the parameter approach is probably the right one.  This would
 require that there be a callback into the driver to get this state, and
 more importantly, the driver would have to set a bit in ctx-NewState
 (perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which
 would affect that internal state atom.
 Thank you.


 I've hit a bit of a problem:
 I was planning to have state flags returned from a callback
 make_state_flags().
 something like:
 ctx-Driver.GetGenericStateFlags(state);

 The problem being that the context ctx is not a parameter in
 make_state_flags().

 Is there smart way of solving this?
 Rune,

 I don't quite understand what you want to do here.  Can you show me the 
 code you'd like to have (ignoring the ctx argument issue)?  I would have 
 thought that we could determine the state statically and just rely on 
 the driver to set that state in ctx-NewState when necessary.

 
 I am trying to make generic state vars that the drivers can use.
 
 the way I read these functions:
 make_state_flags()  - returns the state flags should trigger an update
   of the state var.
 
 _mesa_fetch_state() - fetches the state var.
 
 In order to make generic state vars.
 - I need to get the flags via a callback to the driver from
 make_state_flags().
 
 I need to fetch the vars via a callback to the driver from
 _mesa_fetch_state().
 
 
 make_state_flags()
 {
 .
 case STATE_INTERNAL:
 {
  switch (state[1]) {
 case STATE_NORMAL_SCALE:
   .
break;
 case STATE_TEXRECT_SCALE:
   .
break;
 case STATE_GENERIC1:
assert(ctx-Driver.FetchGenericState);
ctx-Driver.FetchGenericState(ctx, state, value);
break;
  }
 }
 }
 
 _mesa_fetch_state()
 {
   .
 case STATE_INTERNAL:
 switch (state[1]) {
 case STATE_NORMAL_SCALE:
 return _NEW_MODELVIEW;
 case STATE_TEXRECT_SCALE:
 return _NEW_TEXTURE;
 case STATE_GENERIC1:
 assert(ctx-Driver.GetGenericStateFlags);
 return ctx-Driver.GetGenericStateFlags(state);
 }
 
 }

I guess what I'm wondering is whether the flags you want to put into the 
driver as generics are actually things which are universal and should be 
supported across Mesa and the other drivers - is it just stuff like 
window position?  I think it would be better to create a new 
STATE_WINDOW_POSITION keyed off something like _NEW_BUFFERS for that. 
It would still be the driver's responsibility to set _NEW_BUFFERS on 
window position changes though.

Keith

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-10-31 Thread Keith Whitwell

Rune Petersen wrote:
 Keith Whitwell wrote:
 Roland Scheidegger wrote:
 Keith Whitwell wrote:
 I think Rune is rather refering to the fact that you can't change (not
 with legal means at least) the constant you got with
 _mesa_add_unnamed_constant.
 Ah right.  I missed that.

 I think there exist at least 2 solutions for that. The clean way would
 probably be to add some more INTERNAL_STATE (like i965 driver uses) so
 you use _mesa_add_state_reference instead, in this case mesa's shader
 code would need to update program parameter based on the drawable
 information - I'm not sure if accessing a driver's drawable
 information there would get messy). The easier solution would probably
 be to just directly manipulate the ParameterValues entry associated
 with the constant you added, easy though it might be considered
 somewhat hackish. Just don't forget you not only have to update the
 constant within r300UpdateWindow (if the currently bound fp requires
 it), but also when the active fp is switched to another one (and make
 sure that a parameter upload is actually triggered if it not already
 is upon drawable changes).
 I think the parameter approach is probably the right one.  This would
 require that there be a callback into the driver to get this state, and
 more importantly, the driver would have to set a bit in ctx-NewState
 (perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which
 would affect that internal state atom.
 
 Thank you.
 
 
 I've hit a bit of a problem:
 I was planning to have state flags returned from a callback
 make_state_flags().
 something like:
 ctx-Driver.GetGenericStateFlags(state);
 
 The problem being that the context ctx is not a parameter in
 make_state_flags().
 
 Is there smart way of solving this?

Rune,

I don't quite understand what you want to do here.  Can you show me the 
code you'd like to have (ignoring the ctx argument issue)?  I would have 
thought that we could determine the state statically and just rely on 
the driver to set that state in ctx-NewState when necessary.

Keith

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)

2006-10-20 Thread Keith Whitwell

Ryan Richter wrote:
 On Wed, Oct 18, 2006 at 07:54:41AM +0100, Keith Whitwell wrote:
 This is all a little confusing as the driver doesn't really use that 
 path in normal operation except for a single command - MI_FLUSH, which 
 is shared between the architectures.  In normal operation the hardware 
 does the validation for us for the bulk of the command stream.  If there 
  were missing functionality in that ioctl, it would be failing 
 everywhere, not just in this one case.

 I guess the questions I'd have are
  - did the driver work before the kernel upgrade?
  - what path in userspace is seeing you end up in this ioctl?
  - and like Keith, what commands are you seeing?

 The final question is interesting not because we want to extend the 
 ioctl to cover those, but because it will give a clue how you ended up 
 there in the first place.
 
 Here's a list of all the failing commands I've seen so far:
 
 3a440003
 d70003
 2d010003
 e5b90003
 2e730003
 8d8c0003
 c10003
 d90003
 be0003
 1e3f0003

Ryan,

Those don't look like any commands I can recognize.  I'm still confused 
how you got onto this ioctl in the first place - it seems like something 
pretty fundamental is going wrong somewhere.  What would be useful to me 
is if you can use GDB on your application and get a stacktrace for how 
you end up in this ioctl in the cases where it is failing?

Additionally, if you're comfortable doing this, it would be helpful to 
see all the arguments that userspace thinks its sending to the ioctl, 
compared to what the kernel ends up thinking it has to validate.  There 
shouldn't ever be more than two dwords being validated at a time, and 
they should look more or less exactly like {0x0203, 0}, and be 
emitted from bmSetFence().

All of your other wierd problems, like the assert failures, etc, make me 
wonder if there just hasn't been some sort of build problem that can 
only be resolved by clearing it out and restarting.

It wouldn't hurt to just nuke your current Mesa and libdrm builds and 
start from scratch - you'll probably have to do that to get debug 
symbols for gdb anyway.

Keith

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)

2006-10-20 Thread Keith Whitwell

Ryan Richter wrote:
 On Fri, Oct 20, 2006 at 06:51:01PM +0100, Keith Whitwell wrote:
 Ryan Richter wrote:
 On Fri, Oct 20, 2006 at 12:43:44PM +0100, Keith Whitwell wrote:
 Ryan Richter wrote:
 On Wed, Oct 18, 2006 at 07:54:41AM +0100, Keith Whitwell wrote:
 All of your other wierd problems, like the assert failures, etc, make me 
 wonder if there just hasn't been some sort of build problem that can 
 only be resolved by clearing it out and restarting.

 It wouldn't hurt to just nuke your current Mesa and libdrm builds and 
 start from scratch - you'll probably have to do that to get debug 
 symbols for gdb anyway.
 I had heard something previously about i965_dri.so maybe getting
 miscompiled, but I hadn't followed up on it until now.  I rebuilt it
 with an older gcc, and now it's all working great!  Sorry for the wild
 goose chase.
 Out of interest, can you try again with the original GCC and see if the 
 problem comes back?  Which versions of GCC are you using?
 
 The two gcc versions are the 4.1 (miscompiles) and 3.4 (OK) from Debian
 unstable.  I had originally compiled it myself with gcc-4.1 because the
 Debian libgl1-mesa-dri package didn't build i965_dri.so until I
 submitted a build patch to them to have it built.  They released a new
 package a few days ago with i965_dri.so included, presumably built with
 the same gcc-4.1, the default cc on Debian unstable.
 
 I had exactly the same problems with my own version and theirs.  I
 rebuilt it again today with CC=gcc-3.4 and now everything works great.
 I saved a copy of the old i965_dri.so, so I can verify in the next few
 days that replacing it breaks things again.  Let me know if you want
 copies of these files to examine.

Sure, email me the 4.1 version offline.  I'll also see about installing 
4.1 here.

Keith

-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)

2006-10-18 Thread Keith Whitwell

This is all a little confusing as the driver doesn't really use that 
path in normal operation except for a single command - MI_FLUSH, which 
is shared between the architectures.  In normal operation the hardware 
does the validation for us for the bulk of the command stream.  If there 
  were missing functionality in that ioctl, it would be failing 
everywhere, not just in this one case.

I guess the questions I'd have are
- did the driver work before the kernel upgrade?
- what path in userspace is seeing you end up in this ioctl?
- and like Keith, what commands are you seeing?

The final question is interesting not because we want to extend the 
ioctl to cover those, but because it will give a clue how you ended up 
there in the first place.

Keith

Keith Packard wrote:
 On Tue, 2006-10-17 at 13:40 -0400, Ryan Richter wrote:
 
 So do I want something like


 static int do_validate_cmd(int cmd)
 {
  return 1;
 }

 in i915_dma.c?
 
 that will certainly avoid any checks. Another alternative is to printk
 the cmd which fails validation so we can see what needs adding here.
 
 
 
 
 
 -
 Using Tomcat but need to do more? Need to support web services, security?
 Get stuff done quickly with pre-integrated technology to make your job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 
 
 
 
 --
 ___
 Dri-devel mailing list
 Dri-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dri-devel


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)

2006-10-14 Thread Keith Whitwell

Your drm module is out of date.

Keith

Ryan Richter wrote:
 I have a new Intel 965G board, and I'm trying to get DRI working.
 Direct rendering is enabled, but all GL programs crash immediately.
 The message 'DRM_I830_CMDBUFFER: -22' is printed on the tty, and the
 kernel says:
 
 [drm:i915_cmdbuffer] *ERROR* i915_dispatch_cmdbuffer failed
 
 Additionally, glxinfo says (in addition to its normal output):
 
 glxinfo: bufmgr_fake.c:1245: bmReleaseBuffers: Assertion `intel-locked' 
 failed.
 
 This is with 2.6.19-rc2 (and -rc1). Here's a .config and dmesg:
 
 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.19-rc2
 # Fri Oct 13 13:42:19 2006
 #
 CONFIG_X86_64=y
 CONFIG_64BIT=y
 CONFIG_X86=y
 CONFIG_ZONE_DMA32=y
 CONFIG_LOCKDEP_SUPPORT=y
 CONFIG_STACKTRACE_SUPPORT=y
 CONFIG_SEMAPHORE_SLEEPERS=y
 CONFIG_MMU=y
 CONFIG_RWSEM_GENERIC_SPINLOCK=y
 CONFIG_GENERIC_HWEIGHT=y
 CONFIG_GENERIC_CALIBRATE_DELAY=y
 CONFIG_X86_CMPXCHG=y
 CONFIG_EARLY_PRINTK=y
 CONFIG_GENERIC_ISA_DMA=y
 CONFIG_GENERIC_IOMAP=y
 CONFIG_ARCH_MAY_HAVE_PC_FDC=y
 CONFIG_ARCH_POPULATES_NODE_MAP=y
 CONFIG_DMI=y
 CONFIG_AUDIT_ARCH=y
 CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
 
 #
 # Code maturity level options
 #
 CONFIG_EXPERIMENTAL=y
 CONFIG_LOCK_KERNEL=y
 CONFIG_INIT_ENV_ARG_LIMIT=32
 
 #
 # General setup
 #
 CONFIG_LOCALVERSION=
 CONFIG_LOCALVERSION_AUTO=y
 CONFIG_SWAP=y
 CONFIG_SYSVIPC=y
 # CONFIG_IPC_NS is not set
 # CONFIG_POSIX_MQUEUE is not set
 # CONFIG_BSD_PROCESS_ACCT is not set
 # CONFIG_TASKSTATS is not set
 # CONFIG_UTS_NS is not set
 # CONFIG_AUDIT is not set
 # CONFIG_IKCONFIG is not set
 # CONFIG_CPUSETS is not set
 CONFIG_RELAY=y
 CONFIG_INITRAMFS_SOURCE=
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 CONFIG_SYSCTL=y
 # CONFIG_EMBEDDED is not set
 CONFIG_UID16=y
 # CONFIG_SYSCTL_SYSCALL is not set
 CONFIG_KALLSYMS=y
 # CONFIG_KALLSYMS_ALL is not set
 # CONFIG_KALLSYMS_EXTRA_PASS is not set
 CONFIG_HOTPLUG=y
 CONFIG_PRINTK=y
 CONFIG_BUG=y
 CONFIG_ELF_CORE=y
 CONFIG_BASE_FULL=y
 CONFIG_FUTEX=y
 CONFIG_EPOLL=y
 CONFIG_SHMEM=y
 CONFIG_SLAB=y
 CONFIG_VM_EVENT_COUNTERS=y
 CONFIG_RT_MUTEXES=y
 # CONFIG_TINY_SHMEM is not set
 CONFIG_BASE_SMALL=0
 # CONFIG_SLOB is not set
 
 #
 # Loadable module support
 #
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
 # CONFIG_MODULE_FORCE_UNLOAD is not set
 # CONFIG_MODVERSIONS is not set
 # CONFIG_MODULE_SRCVERSION_ALL is not set
 # CONFIG_KMOD is not set
 CONFIG_STOP_MACHINE=y
 
 #
 # Block layer
 #
 CONFIG_BLOCK=y
 CONFIG_LBD=y
 # CONFIG_BLK_DEV_IO_TRACE is not set
 # CONFIG_LSF is not set
 
 #
 # IO Schedulers
 #
 CONFIG_IOSCHED_NOOP=y
 CONFIG_IOSCHED_AS=y
 CONFIG_IOSCHED_DEADLINE=y
 CONFIG_IOSCHED_CFQ=y
 CONFIG_DEFAULT_AS=y
 # CONFIG_DEFAULT_DEADLINE is not set
 # CONFIG_DEFAULT_CFQ is not set
 # CONFIG_DEFAULT_NOOP is not set
 CONFIG_DEFAULT_IOSCHED=anticipatory
 
 #
 # Processor type and features
 #
 CONFIG_X86_PC=y
 # CONFIG_X86_VSMP is not set
 # CONFIG_MK8 is not set
 CONFIG_MPSC=y
 # CONFIG_GENERIC_CPU is not set
 CONFIG_X86_L1_CACHE_BYTES=128
 CONFIG_X86_L1_CACHE_SHIFT=7
 CONFIG_X86_INTERNODE_CACHE_BYTES=128
 CONFIG_X86_TSC=y
 CONFIG_X86_GOOD_APIC=y
 CONFIG_MICROCODE=y
 CONFIG_MICROCODE_OLD_INTERFACE=y
 # CONFIG_X86_MSR is not set
 # CONFIG_X86_CPUID is not set
 CONFIG_X86_HT=y
 CONFIG_X86_IO_APIC=y
 CONFIG_X86_LOCAL_APIC=y
 CONFIG_MTRR=y
 CONFIG_SMP=y
 # CONFIG_SCHED_SMT is not set
 CONFIG_SCHED_MC=y
 CONFIG_PREEMPT_NONE=y
 # CONFIG_PREEMPT_VOLUNTARY is not set
 # CONFIG_PREEMPT is not set
 CONFIG_PREEMPT_BKL=y
 # CONFIG_NUMA is not set
 CONFIG_ARCH_SPARSEMEM_ENABLE=y
 CONFIG_ARCH_FLATMEM_ENABLE=y
 CONFIG_SELECT_MEMORY_MODEL=y
 CONFIG_FLATMEM_MANUAL=y
 # CONFIG_DISCONTIGMEM_MANUAL is not set
 # CONFIG_SPARSEMEM_MANUAL is not set
 CONFIG_FLATMEM=y
 CONFIG_FLAT_NODE_MEM_MAP=y
 # CONFIG_SPARSEMEM_STATIC is not set
 CONFIG_SPLIT_PTLOCK_CPUS=4
 CONFIG_RESOURCES_64BIT=y
 CONFIG_NR_CPUS=8
 # CONFIG_HOTPLUG_CPU is not set
 CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
 CONFIG_HPET_TIMER=y
 CONFIG_HPET_EMULATE_RTC=y
 CONFIG_IOMMU=y
 # CONFIG_CALGARY_IOMMU is not set
 CONFIG_SWIOTLB=y
 CONFIG_X86_MCE=y
 CONFIG_X86_MCE_INTEL=y
 # CONFIG_X86_MCE_AMD is not set
 # CONFIG_KEXEC is not set
 # CONFIG_CRASH_DUMP is not set
 CONFIG_PHYSICAL_START=0x20
 CONFIG_SECCOMP=y
 # CONFIG_CC_STACKPROTECTOR is not set
 # CONFIG_HZ_100 is not set
 # CONFIG_HZ_250 is not set
 CONFIG_HZ_1000=y
 CONFIG_HZ=1000
 # CONFIG_REORDER is not set
 CONFIG_K8_NB=y
 CONFIG_GENERIC_HARDIRQS=y
 CONFIG_GENERIC_IRQ_PROBE=y
 CONFIG_ISA_DMA_API=y
 CONFIG_GENERIC_PENDING_IRQ=y
 
 #
 # Power management options
 #
 CONFIG_PM=y
 CONFIG_PM_LEGACY=y
 # CONFIG_PM_DEBUG is not set
 # CONFIG_PM_SYSFS_DEPRECATED is not set
 
 #
 # ACPI (Advanced Configuration and Power Interface) Support
 #
 CONFIG_ACPI=y
 # CONFIG_ACPI_AC is not set
 # CONFIG_ACPI_BATTERY is not set
 CONFIG_ACPI_BUTTON=y
 CONFIG_ACPI_VIDEO=y
 # CONFIG_ACPI_HOTKEY is not set
 CONFIG_ACPI_FAN=y
 # CONFIG_ACPI_DOCK is not set
 CONFIG_ACPI_PROCESSOR=y

Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)

2006-10-14 Thread Keith Whitwell

Arjan van de Ven wrote:
 On Sat, 2006-10-14 at 09:55 +0100, Keith Whitwell wrote:
 Your drm module is out of date.
 
 
 Since the reporter is using the latest brand spanking new kernel, that
 is highly unlikely unless something else in the software universe is
 assuming newer-than-brand-spanking-new.

Heh.  I missed that in the title line.  I'll retire quietly...

Keith


-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-10-09 Thread Keith Whitwell

Roland Scheidegger wrote:
 Keith Whitwell wrote:
 Now I remember why I can't use radeon-dri.drawable, at least not
 directly when the shader code is added:

 When the window size changes the constants have to be updated.

 Is there a way for the driver to update a constant after construction?


 This is an age-old dilemma...  The i965 driver gets around this by 
 locking the hardware before validating and emitting state and drawing 
 commands and unlocking again afterwards - so the window can't change 
 size in the meantime.  Other drivers tend to just deal with the 
 occasional incorrectness.

 In general this is something we need to get a bit better at.  API's 
 like DX9 and GL/ES do away with frontbuffer rendering which gives the 
 drivers a lot more flexibility in terms of dealing with window moves 
 and resizes, allowing them to pick a time to respond to a resize.  
 With private backbuffers we might get the same benefits at least in 
 the common case.
 
 I think Rune is rather refering to the fact that you can't change (not 
 with legal means at least) the constant you got with 
 _mesa_add_unnamed_constant.

Ah right.  I missed that.

 I think there exist at least 2 solutions for that. The clean way would 
 probably be to add some more INTERNAL_STATE (like i965 driver uses) so 
 you use _mesa_add_state_reference instead, in this case mesa's shader 
 code would need to update program parameter based on the drawable 
 information - I'm not sure if accessing a driver's drawable information 
 there would get messy). The easier solution would probably be to just 
 directly manipulate the ParameterValues entry associated with the 
 constant you added, easy though it might be considered somewhat hackish. 
 Just don't forget you not only have to update the constant within 
 r300UpdateWindow (if the currently bound fp requires it), but also when 
 the active fp is switched to another one (and make sure that a parameter 
 upload is actually triggered if it not already is upon drawable changes).

I think the parameter approach is probably the right one.  This would 
require that there be a callback into the driver to get this state, and 
more importantly, the driver would have to set a bit in ctx-NewState 
(perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which 
would affect that internal state atom.

Keith



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-10-08 Thread Keith Whitwell

Rune Petersen wrote:
 Keith Whitwell wrote:
 Rune Petersen wrote:
 Rune Petersen wrote:
 Roland Scheidegger wrote:
 Rune Petersen wrote:
 I hit a problem constructing this:

 - In order to do range mapping in the vertex shader (if I so choose)
 I will need a constant (0.5), but how to add it?
 I think this might work similar to what is used for position invariant
 programs, instead of using _mesa_add_state_reference you could try
 _mesa_add_named_parameter. Otherwise, you could always construct 0.5
 in the shader itself, since you always have the constants 0 and 1
 available thanks to the powerful swizzling capabilities, though
 surprsingly it seems somewhat complicated. Either use 2 instructions
 (ADD 1+1, RCP). Or try EX2/EXP though I'm not sure about performance of
 these, but I guess the approximated EXP should do (2^-1).

 This math in this patch appear sound. the doom3-demo issue appear
 unrelated to fragment.position.

 This version makes use of existing instructions to calculate
 result.position.

 split into 2 parts:
- select_vertex_shader changes
- The actual fragment.position changes

 This patch assumes that:

 - That the temp used to calculate result.position is safe to use (true
 for std. use).

 - That fragment.position.x  y wont be used (mostly true, except for
 exotic programs.)
In order to fix this, I'll need to know the window size, but how?
 Surely it's right there in radeon-dri.drawable ?

 
 Now I remember why I can't use radeon-dri.drawable, at least not
 directly when the shader code is added:
 
 When the window size changes the constants have to be updated.
 
 Is there a way for the driver to update a constant after construction?
 

This is an age-old dilemma...  The i965 driver gets around this by 
locking the hardware before validating and emitting state and drawing 
commands and unlocking again afterwards - so the window can't change 
size in the meantime.  Other drivers tend to just deal with the 
occasional incorrectness.

In general this is something we need to get a bit better at.  API's like 
DX9 and GL/ES do away with frontbuffer rendering which gives the drivers 
a lot more flexibility in terms of dealing with window moves and 
resizes, allowing them to pick a time to respond to a resize.  With 
private backbuffers we might get the same benefits at least in the 
common case.

Keith

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-10-05 Thread Keith Whitwell

Rune Petersen wrote:
 Rune Petersen wrote:
 Roland Scheidegger wrote:
 Rune Petersen wrote:
 I hit a problem constructing this:

 - In order to do range mapping in the vertex shader (if I so choose)
 I will need a constant (0.5), but how to add it?
 I think this might work similar to what is used for position invariant
 programs, instead of using _mesa_add_state_reference you could try
 _mesa_add_named_parameter. Otherwise, you could always construct 0.5
 in the shader itself, since you always have the constants 0 and 1
 available thanks to the powerful swizzling capabilities, though
 surprsingly it seems somewhat complicated. Either use 2 instructions
 (ADD 1+1, RCP). Or try EX2/EXP though I'm not sure about performance of
 these, but I guess the approximated EXP should do (2^-1).

 This math in this patch appear sound. the doom3-demo issue appear
 unrelated to fragment.position.

 This version makes use of existing instructions to calculate
 result.position.

 split into 2 parts:
  - select_vertex_shader changes
  - The actual fragment.position changes

 This patch assumes that:

 - That the temp used to calculate result.position is safe to use (true
 for std. use).

 - That fragment.position.x  y wont be used (mostly true, except for
 exotic programs.)
In order to fix this, I'll need to know the window size, but how?

Surely it's right there in radeon-dri.drawable ?

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-09-23 Thread Keith Whitwell

Rune Petersen wrote:
 It turns out I missed something obvious...
 
 The parameters are passed correctly, I have just not transformed the
 vertex.position to the fragment.position

I guess that's the viewport transformation, or maybe a perspective 
divide followed by viewport transformation.

But I think there's a bigger problem here -- somehow you're going to 
have to arrange for that value to be interpolated over the triangle so 
that each fragment ends up with the correct position.

Maybe they are being interpolated already?  I guess it then depends on 
whether the interpolation is perspective correct so that once 
transformed you really get the right pixel coordinates rather than just 
a linear interpolation across the triangle.

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: [r300] partly working fragment.position patch

2006-09-23 Thread Keith Whitwell

Rune Petersen wrote:
 Keith Whitwell wrote:
 Rune Petersen wrote:
 It turns out I missed something obvious...

 The parameters are passed correctly, I have just not transformed the
 vertex.position to the fragment.position
 I guess that's the viewport transformation, or maybe a perspective
 divide followed by viewport transformation.
 
 I did do a viewport transformation, but I didn't map the z component
 from a range of [-1 1] to [0 1].
 Perspective divide is also needed, but not in my test app (w=1)
 
 ATI appears to do perspective divide in the fragment shader.
 
 I hit a problem constructing this:
 
 - In order to do range mapping in the vertex shader (if I so
   choose) I will need a constant (0.5), but how to add it?
 
 - If I do perspective divide in the fragment shader, I will need to
   remap the WPOS from an INPUT to a TEMP.
 
 But I think there's a bigger problem here -- somehow you're going to
 have to arrange for that value to be interpolated over the triangle so
 that each fragment ends up with the correct position.

 Maybe they are being interpolated already?  I guess it then depends on
 whether the interpolation is perspective correct so that once
 transformed you really get the right pixel coordinates rather than just
 a linear interpolation across the triangle.
 
 Is there a way to visually verify this?

Yes of course - once you've got it working, emit the position as 
fragment.color and have a test program read it back.  If it is correct 
on triangles that are 'flat' but incorrect on ones that are angled away 
from the viewer, then it is wrong.  My guess is it'll probably be fine.

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: tnl trouble, how can you do hw state changes depending on the primitive being rendered

2006-09-22 Thread Keith Whitwell

Ian Romanick wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Roland Scheidegger wrote:
 Roland Scheidegger wrote:
 I thought there was a mechanism that allowed the driver to be
 notified at glBegin (or similar) time.  It seems like you ought to be
 able to emit some extra state at that time to change to / from
 point-sprite mode.
 Ah, sounds like a plan. I thought the NotifyBegin would only be useful
 for vtxfmt-replace like things. I'll look into that.
 That was too fast. The NotifyBegin will only be called if there is 
 actually new state, otherwise the tnl module will simply keep adding new 
 primitives.
 
 I think the core should be modified to call NotifyBegin if there is new
 state *or* the primitive type changes.  Perhaps there should be a flag
 to request it being called in that case.


Basically for the hwtnl case you need to look at where you're emitting 
the drawing commands and inject the state right at that point.  For r200 
and ignoring the vtxfmt stuff, that means you need to modify the loop in 
  r200_run_tcl_render to emit the right state at the right time, 
depending on what primitive is about to be emitted.

The i965 driver is quite involved at this level as it has to change all 
sorts of stuff based on the primitive - the clipping algorithm obviously 
changes between points, lines and triangles, and so on.

Regular swtnl drivers also turn stuff on/off based on primtive, there is 
  quite a bit of mechanism in place for this already - have a look at 
eg. r128ChooseRenderState and r128RenderPrimitive/r128RasterPrimitive 
for details.

Keith


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: tnl trouble, how can you do hw state changes depending on the primitive being rendered

2006-09-22 Thread Keith Whitwell

Brian Paul wrote:
 Ian Romanick wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Roland Scheidegger wrote:

 Roland Scheidegger wrote:

 I thought there was a mechanism that allowed the driver to be
 notified at glBegin (or similar) time.  It seems like you ought to be
 able to emit some extra state at that time to change to / from
 point-sprite mode.
 Ah, sounds like a plan. I thought the NotifyBegin would only be useful
 for vtxfmt-replace like things. I'll look into that.
 That was too fast. The NotifyBegin will only be called if there is 
 actually new state, otherwise the tnl module will simply keep adding new 
 primitives.

 I think the core should be modified to call NotifyBegin if there is new
 state *or* the primitive type changes.  Perhaps there should be a flag
 to request it being called in that case.

 Brian, do you have an opinion on this?
 
 The tnl module is pretty much Keith's domain.
 
 One thing to keep in mind is glPolygonMode.  Depending on whether a 
 triangle is front or back-facing, it may either be rendered as a 
 filled triangle, lines, or the vertices rendered as GL_POINTS (which 
 may be sprites!).  I think cases like that might be a fallback.
 
 Anyway, even if glBegin(GL_TRIANGLES) is called, you may wind up 
 rendering lines, points or sprites instead of triangles.
 
 Off-hand I don't know how this is currently handled in the DRI 
 drivers.  Keith would know.
 

The i965 handles polygonmode in hardware by uploading programs that deal 
with all the possibilities.  It's tempting to say it just works, but 
the reality is that it is pretty intricately coded.

The r200 falls back to software tnl for unfilled triangles, and uses the 
  same mechanisms as swtnl drivers for this.

Regular swtnl drivers handle unfilled polygons by using the templates in 
tnl_dd/ to generate triangle functions which provide all the necessary 
logic for selecting the right sort of primitive and notifying the driver 
of transitions between the different primitive types.  This is the 
RenderPrimitive/RasterPrimitive callbacks that exist in most of the 
swtnl drivers.

Keith

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1368 matches

Mail list logo