Re: [Mesa3d-dev] [mesa] svga: Fix error: cannot take address of bit-field 'texture_target' in svga_tgsi.h
This looks like my fault. It would be nice to have the r300 and nvidia drivers building by default (eg on linux-debug builds), even if they don't create full drivers, so that a single build can get greater coverage. Keith On Wed, 2010-01-06 at 09:09 -0800, Sedat Dilek wrote: OK. That's the next one :-) ... In file included from r300_emit.c:36: r300_state_inlines.h: In function ‘r300_translate_tex_filters’: r300_state_inlines.h:263: error: ‘is_anisotropic’ undeclared (first use in this function) r300_state_inlines.h:263: error: (Each undeclared identifier is reported only once r300_state_inlines.h:263: error: for each function it appears in.) make: *** [r300_emit.o] Error 1 ... I am having dinner, now - Sedat - On Wed, Jan 6, 2010 at 6:07 PM, Brian Paul bri...@vmware.com wrote: Sedat Dilek wrote: Hi, this patch fixes a build-error in mesa GIT master after... commit 251363e8f1287b54dc7734e690daf2ae96728faf (patch) configs: set INTEL_LIBS, INTEL_CFLAGS, etcmaster From my build-log: ... In file included from svga_pipe_fs.c:37: svga_tgsi.h: In function 'svga_fs_key_size': svga_tgsi.h:122: error: cannot take address of bit-field 'texture_target' make[4]: *** [svga_pipe_fs.o] Error 1 Might be introduced in... commit 955f51270bb60ad77dba049799587dc7c0fb4dda Make sure we use only signed/unsigned ints with bitfields. Kind Regars, - Sedat - I just fixed that. -Brian -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Mesa3d-dev mailing list mesa3d-...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [Patch VIA UniChrome DRM][2/5 Ver1] Add support for video command flush and interface for V4L kernel module
On Thu, 2009-10-08 at 02:35 -0700, brucech...@via.com.tw wrote: Hello Thomas: If I understand the code correctly, the user-space application prepares command buffers directly in AGP, and asks the drm module to submit them. We can't allow this for security reasons. The user-space application could for example fill the buffer with commands to texture from arbitrary system memory, getting hold of other user's private data. The whole ring-buffer stuff and the command verifier was once implemented to fix that security problem. Thank you very much for your comment. What if we do a security check in these buffer before submit? Let me check if there is any way to work around for this security issue. Who would do that security check? Userspace? That doesn't work as userspace is not trusted. The kernel? Ok, but now it's reading commands out of a presumably write-combined AGP buffer, which is slow. You'd have been better off passing the commands to the kernel in regular memory, which is presumably exactly what the existing mechanism does. Keith -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: preventing GPU reset DoS
On Tue, 2009-09-22 at 12:13 -0700, Pauli Nieminen wrote: Hi! I have been thinking GPU reset as possible DoS attack from user-space.Problem here is that display doesn't work anymore at all if attacker chooses to run a application that constantly causes GPU hang. It would be of course ideal to have CS checker not to let in any problematic combinations of commands. Butin practice we can't assume that everything is safe with all hardware so we need to take some actions prevent possible problems. So first defense would be terminating application that did send command stream that caused GPU hang. But attacker could easily by-pass this protection with forking all the time new processes. So we need stronger defense if same user account is causing multiple hangs in short time frame. I would think temporary denying new DRI access would let user to gain back control of system and take actions to stop the problematic program from running. OK, but you'd want to be able to turn it off for developers -- you've just described my normal workflow... Keith -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH 6/6] [drm/i915] implement drmmode overlay support v2
On Tue, 2009-09-01 at 02:20 -0700, Thomas Hellström wrote: Stephane Marchesin wrote: 2009/8/31 Thomas Hellström tho...@shipmail.org: The problem I see with Xv-API-in-kernel is that of the various hw constrains on the buffer layout. IMHO this has two solutions: a) complicated to communicate the constrains to userspace. This is either to generic or not suitable for everything. IIRC Xv exposes this all the way down to the user-app, as format and then offset into buffer + stride for each plane? Well, for example if your overlay can only do YUY16 in hardware, you still might want to expose YV12/I420 through Xv and do internal conversion. So you'd have to add format conversion somewhere in the stack (probably in user space though). The same happens for swapped components and planar/interlaced; does your hw do YV12, I420, NV12 or something else ? The hw does YV12, YUY2 and UYVY. Since the user of this interface (the Xorg state tracker) is generic, there's really no point (for us) to have driver-specific interfaces that exposes every format that the hardware can do. The situation might be different, I guess, for device-specific Xorg drivers. If we're doing this I think we should expose perhaps a reasonable small number of common formats, and if the hardware doesn't support any of them, the hardware is not going to be supported. That might unfortunately lead to having driver-specific interfaces for the device-specific Xorg driver and a generic interface for the Xorg state tracker, and I'm not sure people like that idea? I'm coming to this late, but if the only difference between hw-specific and hw-independent interfaces is which formats are supported, that surely shouldn't be too hard to abstract? Just have an enum which gets expanded with new format names and query for supported formats in the API. Keith -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event
I think the bug in question was because somebody (Jon Smirl??) removed the empty apparently unused poll implementation from the drm fd, only to discover that the X server was actually polling the fd. If this code adds to, extends or at least doesn't remove the ability to poll the drm fd, it should be fine. Keith From: Kristian Høgsberg [...@bitplanet.net] Sent: Tuesday, August 18, 2009 8:31 AM To: Thomas Hellström Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event That can't be the real problem. The X server polls on a ton of file descriptors already; sockets from clients, dbus, input devices. They all have poll implementations that don't return 0... I mean, otherwise they wouldn't work. Look at evdev_poll() in drivers/input/evdev.c for the evdev poll implementation, for example. You're probably right, but we should probably find out what went wrong and make sure it doesn't happen again with non-modesetting drivers + dri1 before pushing this. I really don't think that's necessary. As I wrote in my reply to Dave, there's nothing in this patch that can cause select(2) to return EINVAL that isn't already present in other poll fops implementations. Like the evdev one, which we already select on - please compare that function with the poll implementation in my patch and tell me why the drm poll is cause for concern. I need a better, more specific reason why this is such a risk and why I should spend more time tracking this stuff down. And if select(2), for whatever reason, returns EINVAL because of the drm_poll() fops implementation, that's a bug in the kernel that needs to be fixed. cheers, Kristian -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event
This seems wrong to me -- the client doesn't need to sleep - all it's going to do is build a command buffer targeting the new backbuffer. There's no problem with that, it should be the preserve of the GPU scheduler (TTM or GEM) to ensure those commands (once submitted) don't get executed until the buffer is available - otherwise you're potentially pausing your application for no good reason. The app should be throttled if it gets more than a couple of frames ahead, but there should be 100% overlap with hardware otherwise. If you need a solution that doesn't rely on the buffer manager, perhaps resort to triple-buffering, or else create a new buffer and return that in DRI2GetBuffers (and let the scanout one be freed once the flip is done). It seems like arbitrating command execution against on-hardware buffers should be the preserve of the kernel memory manager other actors shouldn't be double-guessing that. Keith From: Kristian Høgsberg [...@bitplanet.net] Sent: Tuesday, August 18, 2009 11:46 AM To: Thomas Hellström Cc: Kristian Høgsberg; Jesse Barnes; dri-de...@lists.sf.net Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event We don't put clients to sleep until they try to render to the new backbuffer. For direct rendering this happens when the client calls DRI2GetBuffers() after having called DRI2SwapBuffers(). If the flip is not yet finished at that time, we restart the X request and suspend the client. When the drm event fires it is read by the ddx driver, which then calls DRI2SwapComplete() which will wake the client up again. For AIGLX, we suspend the client in __glXForceCurrent(), but the wakeup happens the same way. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
RE: [PATCH] Add modesetting pageflip ioctl and corresponding drm event
No, I'm fine. I don't have the patch in front of me but it doesn't sound like it precludes these types of changes in the future. Keith From: Jesse Barnes [jbar...@virtuousgeek.org] Sent: Tuesday, August 18, 2009 1:23 PM To: Keith Whitwell Cc: Kristian Høgsberg; Thomas Hellström; Kristian Høgsberg; dri-de...@lists.sf.net Subject: Re: [PATCH] Add modesetting pageflip ioctl and corresponding drm event Anyway, to me this discussion is more of a future directions one than a blocker for this particular patchset. AFAICT the only thing that needs fixing with this patch is my lock confusion (struct_mutex vs mode_config). Or would you like something substantial changed with these bits before they land? -- Jesse Barnes, Intel Open Source Technology Center -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Doing better than CS ioctl ?
Dave, The big problem with the (second) radeon approach of state objects was that we defined those objects statically encoded them into the kernel interface. That meant that when new hardware functionality was needed (or discovered) we had to rev the kernel interface, usually in a fairly ugly way. I think Jerome's approach could be a good improvement if the state objects it creates are defined by software at runtime, more like little display lists than pre-defined state atoms. The danger again is that you run into cases where you need to expand objects the verifier will allow userspace to create, but at least in doing so you won't be breaking existing users of the interface. I think the key is that there should be no pre-defined format for these state objects, simply that they should be a sequence of legal commands/register writes that the kernel validates once and userspace can execute multiple times. Keith On Sat, 2009-08-08 at 05:43 -0700, Dave Airlie wrote: On Sat, Aug 8, 2009 at 7:51 AM, Jerome Glissegli...@freedesktop.org wrote: Investigating where time is spent in radeon/kms world when doing rendering leaded me to question the design of CS ioctl. As i am among the people behind it, i think i should give some historical background on the choice that were made. I think this sounds quite like the original radeon interface or maybe even a bit like the second one. The original one stored the registers in the sarea, and updated the context under the lock, and had the kernel emit it. The sceond one had a bunch of state objects, containing ranges of registers that were safe to emit. Maybe Keith Whitwell can point out why these were a good/bad idea, not sure if anyone else remembers that far back. Dave. The first motivation behind cs ioctl was to take common language btw userspace and kernel and btw kernel and device. Of course in an ideal world command submitted through cs ioctl could directly be forwarded to the GPU without much overhead. Thing is, the world we leave in isn't that good. There is 2 things the cs ioctl do before forwarding command: 1- First it must rewrite any packet which supply an offset to GPU with the address the memory manager validate the buffer object associated to this packet. We can't get rid of this with the cs ioctl (we might do somethings very clever like doing a new microcode for the cp so that cp can rewrite packet using some table of validated buffer offset but i am not even sure cp would be powerful enough to do that). 2- In order to provide a more advanced security than what we did have in the past i added a cs checker facility which is responsible to analyze the command stream and make sure that the GPU won't read or write outside the supplied buffer object list. DRI1 didn't offered such advanced checking. This feature was added with GPU sharing in mind where sensible application might run on the GPU and for which we might like to protect their memory. We can obviously avoid the second item and things would work but userspace would be able to abuse the GPU to access outside the GPU object its own (this doesn't means it will be able to access any system ram but rather any ram that is mapped to GPU which should for the time being only be pixmap, texture, vbo or things like that). Bottom line is that with cs ioctl we do 2 times a different work. In userspace we build a command stream under stable by the GPU and in kernel space we unencode this command stream to check it. Obviously this sounds wrong. That being said, CS ioctl isn't that bad, it doesn't consume much on benchmark i have done but i expect it might consume a more on older cpu or when many complex 3D apps run at the same time. So i am not proposing to trash it away but rather to discuss about a better interface we could add at latter point to slowly replace cs. CS is bringing today feature we needed yesterday so we should focus our effort on getting cs ioctl as smooth and good as possible. So as a pet project i have been thinking this last few days of what would be a better interface btw userspace and kernel and i come up with somethings in btw gallium state object and nvidia gpu object (well at least as far as i know each of this my design sounds close to that). Idea behind design is that whenever userspace allocate a bo, userspace knows about properties of the bo. If it's a texture userspace knows the size, the number of mipmap level, the border,... of the textur. If it's a vbo it's knows the layout the size, number of elements, ... same for rendering viewport it knows the size and associated properties Design 2 ioctl: create_object : supply : - object type id specific to asic - object structure associated to type id, fully describing the object
Re: DRM development process wiki page..
Major bumps once stuff went into the kernel weren't allowed at all. You'd need to fork the driver in any case. So we did this once or twice on drivers in devel trees like mach64. However upstream first policy should avoid this need. I'd also prefer to see getparam for new features instead of version checks. The linear version check sucks. This is an interesting concept that opens up some ideas for dealing with feature deprecation, etc. Think about opengl's extension mechanism -- features can be exposed through that mechanism without ever providing a guarantee of future availability -- in fact there is no guarantee of any availability outside the current session. Future versions of a GL driver might add or remove extensions as desired, within the constraints of the GL version number advertised. What we could see is something similar for the DRM interface -- a base level of functionality specified by the Major/Minor numbers, but additional extensions that may be advertised according to the whim of the kernel module that the driver can take advantage of if present, but which it must otherwise function correctly without... Extensions that don't work out can be dropped, those that do can be incorporated into the next increment of the minor number, a la GL1.5 Keith - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC: 2.6 patch] remove the i830 driver
On Tue, Jul 15, 2008 at 3:35 PM, Simon Farnsworth [EMAIL PROTECTED] wrote: Keith Whitwell wrote: You can still buy new i865 boards: http://www.ebuyer.com/product/119412 So I think this isn't a great idea. This won't remove all support for i865. It only removes support for the combination of i865 and old X servers. New X servers use the i915 driver to support the i865 chipset. You're right -- removing the old module is fine by me. Keith - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Combining Mesa3D and DRI mailing lists and/or sites? (was: Re: Wrapping up 7.4 (finally))
On Mon, Jun 16, 2008 at 8:31 AM, Timo Jyrinki [EMAIL PROTECTED] wrote: 2008/6/12 Keith Whitwell [EMAIL PROTECTED]: In reality, what has happened is that most of this has already occurred -- whatever 3d driver-related traffic that hasn't been sucked into IRC is now occurring on the Mesa lists. Right. I now rearranged DRI wiki's mailing list page http://dri.freedesktop.org/wiki/MailingLists by stating that fact. I also commented out the dri-announce mailing list which hadn't been used for 5+ years. I actually think the current structure makes a lot of sense - if we wanted a change, we could rename dri-devel to drm-devel, but it hardly seems worthwhile. It'd be nice, but only if somehow automagic enough. Just documentation is mostly enough, too. What about the dri-users mailing list? From users point of view DRI/Mesa/DRM are mostly all the same (users want them all), and any users of DRM are likely to be halfway developers anyway. While DRI discussion has successfully migrated to mesa3d-dev list, users are currently randomly posting either mesa3d-users or dri-users and the discussion is not coherent. Could those two mailing lists be merged into mesa3d-users, or do you think that mentioning dri-users is (nowadays) for DRM discussion is enough to fix the problem from now on? I think dri-users is certainly redundant now, likewise -announce. If those could somehow get funneled into mesa-users or an appropriate Xorg list, that would be fine with me... Regarding wikis, I also started reorganizing the front page http://dri.freedesktop.org/ a bit, including changing title to include Mesa, too. I still think that it could be the wiki for both Mesa and DRI, and that mesa3d.org could include a link to the wiki (or DRI wiki because of the current status) under eg. the Resources title instead of having the link to DRI website only in the bottom of the navigation. What do you think? I'm also ok with this general concept. Keith - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: GEM merging to master
If this was a test of just two memory manager implementations, the benchmarks would speak for themselves. However, there are at least two driver changes I caught on first review of gallium-i915-current's i915simple (which I assume is what you were testing, given that the last tests I've heard from you guys were using that) that would have an impact on performance: As far as I know the gallium driver isn't involved in these tests -- this is a comparison between the original i915tex and newer versions of the driver. i915simple is missing support for tiling at this stage (private backbuffers) so any performance results on that driver are unlikely to be meaningful yet. Keith - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Combining Mesa3D and DRI mailing lists and/or sites? (was: Re: Wrapping up 7.4 (finally))
On Thu, Jun 12, 2008 at 5:28 PM, Timo Jyrinki [EMAIL PROTECTED] wrote: 2008/6/12 Daniel Stone [EMAIL PROTECTED]: On Thu, Jun 12, 2008 at 10:49:57AM +0300, Timo Jyrinki wrote: Speaking of which, if you have any ideas how to better interlink and combine: - http://dri.freedesktop.org/ - http://xorg.freedesktop.org/ - http://mesa3d.org/ ... I don't understand why DRI and Mesa have separate lists and websites, tbh, especially given the level of crosstalk. For the wikis, it should be possible to link between them, and I'll try to sort out how to make that happen. Hi. Would it be any beneficial to (either, both or neither): 1. Combine mailing lists as follows: - mesa3d-dev dri-devel - mesa3d-users dri-users - mesa3d-announce dri-announce - mesa-commit dri-patches There is probably historical reasons for the separation, but are there any current ones that would be more important than the benefits for single point of discussion about Mesa/DRI which overlap so much anyway (especially in users' perspective, but also development-wise)? At the same time, they might be moved to freedesktop.org from sourceforge.net? 2. Make DRI's wiki into combined Mesa3D's and DRI's wiki. Mesa3D does not currently have a wiki of its own, but DRI has. Mesa3D certainly doesn't need yet another wiki in addition to X.org wiki and DRI wiki, so why not make it a common one officially? I think Mesa3D's current web site is quite nicely organized, and could be evolved from that by integrating a bit more DRI stuff and the new (currently DRI) wiki into it. Certainly not throwing away it and replacing with a wiki, a wiki would take a very big effort to make it as navigable and organized as the Mesa3D homepage currently is. If either sounds reasonable, is it acceptable for DRI as a project to be generally known (as it already mostly is known, I think) a sub-project of Mesa3D, so the combined name would be simply Mesa3D? Or is there a need for clearer separation between the two? Mainly important from the perspective of naming of the mailing lists, ie. can they be mesa3d-devel or something else. In reality, what has happened is that most of this has already occurred -- whatever 3d driver-related traffic that hasn't been sucked into IRC is now occurring on the Mesa lists. The DRI list has in effect become the list for development of the drm kernel module, libdrm, and the various memory manager implementations. While Mesa is an important client of these, it is far from being the only client. I actually think the current structure makes a lot of sense - if we wanted a change, we could rename dri-devel to drm-devel, but it hardly seems worthwhile. Another proposal would be to merge the DRI lists into LKML... I don't really want to do that either... Keith - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
So possibilities are: - batchbuffer starvation -- has I was going to say 'has this changed significantly' -- and the answer is that it has of course, with the bufmgr_fake changes... I can't tell by quick inspection if these are a likely culprit, but it's certainly a signifcant set of changes relative to the classic version of classic... - over-throttling in swapbuffers -- I think we used to let it get two frames ahead - has this changed? - something else... Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
* Classic is apparently doing suboptimal syncs that limits its performance in some cases (gears, teapot and perhaps openarena), one should not benchmark framerates against classic in those cases. As I said elsewhere, I'd like to get to the bottom of this -- it wasn't always this way. Otherwise we should abandon 'classic' off the trunk and use one of the ye olde 7.0 versions. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: GEM discussion questions
On Tue, May 20, 2008 at 1:29 PM, Thomas Hellström [EMAIL PROTECTED] wrote: Keith Packard wrote: On Mon, 2008-05-19 at 12:13 -0700, Ian Romanick wrote: The obvious overhead I was referring to is the extra malloc / free. That's why I went on to say So, now I have to go back and spend time caching the buffer allocations and doing other things to make it fast. ~ In that context, I is idr as an app developer. :) You'd be wrong then -- the cost of the malloc/write/copy/free is cheaper than the cost of map/write/unmap. One problem that we have here is that none of the benchmarks currently being used hit any of these paths. OpenArena, Enemy Territory (I assume this is the older Quake 3 engine game), and gears don't use MapBuffer at all. Unfortunately, any apps that would hit these paths are so fill-rate bound on i965 that they're useless for measuring CPU overhead. The only place we see significant map/write/unmap vs malloc/write/copy/free is with batch buffers, and so far the measurements that I've taken which appear to show a benefit haven't been reproduced by others... We could certainly use texdown to test this out, if the GEM i915 driver implemented a pwrite-enabled struct dd_function_table::TextureMemCpy() Double-copy texture uploads have been 'tested' in the past -- and their poor performance was one of the motivating factors for creating a single-copy scheme. The double-copy upload path isn't *that* bad, as long as the entire texture fits into cache... As soon as it exceeds the cache dimensions, it falls off a cliff. FWIW, Intel are making some cpus with pretty small caches these days, and teaming them up with i945 gpus, so this isn't completely theoretical. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM vs GEM discussion questions
- Original Message From: Ian Romanick [EMAIL PROTECTED] To: DRI dri-devel@lists.sourceforge.net Sent: Monday, May 19, 2008 10:04:09 AM Subject: Re: TTM vs GEM discussion questions -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ian Romanick wrote: | I've read the GEM documentation several times, and I think I have a good | grasp of it. I don't have any non-trivial complaints about GEM, but I | do have a couple comments / observations: | | - I'm pretty sure that the read_domain = GPU, write_domain = CPU case | needs to be handled. I know of at least one piece of hardware with a | kooky command buffer that wants to be used that way. | | - I suspect that in the (near) future we may want multiple read_domains. | ~ I can envision cases where applications using, for example, vertex | feedback mode would want to read from a buffer while the GPU is also | reading from the buffer. | | - I think drm_i915_gem_relocation_entry should have a size field. | There are a lot of cases in the current GL API (and more to come) where | the entire object will trivially not be used. Clamped LOD on textures | is a trivial example, but others exist as well. Another question occurred to me. What happens on over-commit? Meaning, in order to draw 1 polygon more memory must be accessible to the GPU than exists. This was a problem that I never solved in my 2004 proposal. At the time on R200 it was possible to have 6 maximum size textures active which would require more than the possible on-card + AGP memory. I don't actually think the problem is solvable for buffer-based memory managers -- the best we can do is spot the failure and recover, either early as the commands are submitted by the API, or some point later, and for some meaning of 'recover' (eg - fail cleanly, fallback, use-smaller-mipmaps, disable texturing, etc). The only real way to solve it is to move to a page-based virtualizaization of GPU memory, which requires hardware support and isn't possible on most cards. Note that this is different from per-process GPU address spaces, and is a signficantly tougher problem even on supporting hardware. Note there are two concepts with similar common names: - virtual GPU memory -- ie per-context page tables, but still a buffer-based memory manager, textures pre-loaded into GPU memory prior to command execution - virtualized GPU memory -- as above, but with page faulting, typically IRQ-driven with kernel assistance. Parts of textures may be paged in/out as required, according to the memory access patterns of active shaders. It's not clear to me which of the above the r300 nv people are aiming at, but in my opinion the latter is such a significant departure from what we have been thinking about that I have always believed it should be addressed by a new set of interfaces. Ketih - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM vs GEM discussion questions
- Original Message From: Thomas Hellström [EMAIL PROTECTED] To: Stephane Marchesin [EMAIL PROTECTED] Cc: DRI dri-devel@lists.sourceforge.net Sent: Monday, May 19, 2008 9:49:21 AM Subject: Re: TTM vs GEM discussion questions Stephane Marchesin wrote: On 5/18/08, Thomas Hellström wrote: Yes, that was really my point. If the memory manager we use (whatever it is) does not allow this kind of behaviour, that'll force all cards to use a kernel-validated command submission model, which might not be too fast, and more difficult to implement on such hardware. I'm not in favor of having multiple memory managers, but if the chosen one is both slower and more complex to support in the future, that'll be a loss for everyone. Unless we want to have another memory manager implementation in 2 years from now... Stephane First, TTM does not enforce kernel command submission, but it forces you to tell the kernel about command completion status in order for the kernel to be able to move and delete buffers. Yes, emitting the moves from the kernel is not a necessity. If your card can do memory protection, you can setup the protection bits in the kernel and ask user space to do the moves. Doing so means in-order execution in the current context, which means that in the normal case rendering does not need to synchronize with fences at all. I'm not sure how you could avoid that with ANY kernel based memory manager, but I would be interested to know how you expect to solve that problem. See above, if the kernel controls the memory protection bits, it can pretty much enforce things on use space anyway. Well, the primary reason for the kernel to sync and move a buffer object would be to evict it from VRAM, in which case I don't think the user-space approach would be a valid solution, unless, of course, you plan to use VRAM as a cache and back it all with system memory. Just out of interest (I think this is a valid thing to know, and I'm not being TTM / GEM specific here): 1) I've never seen a kernel round-trip per batchbuffer as a huge performance-problem, and it surely simplifies things for an in-kernel memory manger. Do you have any data to back this? 2) What do the Nvidia propriety drivers do w r t this? What I understand is that each hardware context (and there are lots of hardware contexts) has a ringbuffer which is mapped into the address space of the driver assigned that context. The driver just inserts commands into that ringbuffer and the hardware itself schedules context-switches between rings. Then the question is how does this interact with a memory manager. There still has to be some entity managing the global view of memory -- just as the kernel does for the regular vm system on the CPU. A context/driver shouldn't be able to rewrite its own page tables, for instance. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM vs GEM discussion questions
- Original Message From: Dave Airlie [EMAIL PROTECTED] To: Ian Romanick [EMAIL PROTECTED] Cc: DRI dri-devel@lists.sourceforge.net Sent: Monday, May 19, 2008 4:38:02 AM Subject: Re: TTM vs GEM discussion questions All the good that's done us and our users. After more than *5 years* of various memory manager efforts we can't support basic OpenGL 1.0 (yes, 1.0) functionality in a performant manner (i.e., glCopyTexImage and friends). We have to get over this it has to be perfect or it will never get in crap. Our 3D drivers are entirely irrelevant at this point. Except on Intel hardware, who's relevance may or may not be relevant. These can't do copyteximage with the in-kernel drm. To say that userspace APIs cannot die once released is not a relevant counterpoint. We're not talking about a userspace API for general application use. This isn't futexs, sysfs, or anything that applications will directly depend upon. This is an interface between a kernel portion of a driver and a usermode portion of a driver. If we can't be allowed to change or deprecate those interfaces, we have no hope. Note that the closed source guys don't have this artificial handicap. Ian, fine you can take this up with Linus and Andrew Morton, I'm not making this up just to stop you from putting 50 unsupportable memory managers in the kernel. If you define any interface to userspace from the kernel (ioctls, syscalls), you cannot just make it go away. The rule is simple and is that if you install a distro with a kernel 2.6.x.distro, and it has Mesa 7.0 drivers on it, upgrading the kernel to kernel 2.6.x+n without touching userspace shouldn't break userspace ever. If we can't follow this rule we can't put out code into Linus's kernel. So don't argue about it, deal with it, this isn't going to change. and yes I've heard this crap about closed source guys, but we can't follow their route and be distributed by vendors. How many vendors ship the closed drivers? This is also a completely orthogonal issue to maintaining any particular driver. Drivers are removed from the kernel just the same as they are removed from X.org. Assume we upstreamed either TTM or GEM today. Clearly that memory manager would continue to exist as long as some other driver continued to depend on it. I don't see how this is different from cfb or any of the other interfaces within the X server that we've gutted recently. Drivers and pieces of the kernel aren't removed like you think. I think we nuked gamma (didn't have a working userspcae anymore) and ffb (it sucked and couldn't be fixed). Someone is bound to bring up OSS-ALSA, but that doesn't count as ALSA had OSS emulation layer so userspace apps didn't just stop working. Removing chunks of X is vastly different to removing an exposed kernel userspace interface. Please talk to any IBM kernel person and clarify how this stuff works. (Maybe benh could chime in...??) If you want to remove a piece of infrastructure, you have three choices. ~ If nothing uses it, you gut it. If something uses it, you either fix that something to use different infrastructure (which puts you in the nothing uses it state) or you leave things as they are. In spite of all the fussing the kernel guys do in this respect, the kernel isn't different in this respect from any other large, complex piece of infrastructure. So you are going to go around and fix the userspaces on machines that are already deployed? how? e.g. Andrew Morton has a Fedora Core 1 install on a laptop booting 2.6.x-mm kernels, when 3D stops working on that laptop we get to hear about it. So yes you can redesign and move around the kernel internals as much as you like, but you damn well better expose the old interface and keep it working. managers or that we may want to have N memory managers now that will be gutted later. It seems that the real problem is that the memory managers have been exposed as a generic, directly usable, device independent piece of infrastructure. Maybe the right answer is to punt on the entire concept of a general memory manager. At best we'll have some shared, optional use infrastructure, and all of the interfaces that anything in userspace can ever see are driver dependent. That limits the exposure of the interfaces and lets us solve todays problems today. As is trivially apparent, we don't know what the best (for whatever definition of best we choose) answer is for a memory manager interface. ~ We're probably not going to know that answer in the near future. To not let our users have anything until we can give them the best thing is an incredible disservice to them, and it makes us look silly (at best). Well the thing is I can't believe we don't know enough to do this in some way generically, but maybe the TTM vs GEM thing proves its not possible. I don't
Re: TTM vs GEM discussion questions
- Original Message From: Dave Airlie [EMAIL PROTECTED] To: Jerome Glisse [EMAIL PROTECTED] Cc: Keith Whitwell [EMAIL PROTECTED]; Ian Romanick [EMAIL PROTECTED]; DRI dri-devel@lists.sourceforge.net Sent: Monday, May 19, 2008 12:16:57 PM Subject: Re: TTM vs GEM discussion questions For radeon the plan was to return error from superioctl as during superioctl and validation i do know if there is enough gart/vram to do the things. Then i think it's up to upper level to properly handle such failure from superioctl You really want to work this out in advance, at superioctl stage it is too late, have a look at the changes I made to the dri_bufmgr.c classic memory manager case to deal with this for Intel hw, if you got to superioctl and failed unwrapping would be a real pain in the ass, you might have a number of pieces of app state you can't reconstruct. I think DirectX handled this with cut-points where with the buffer you passed the kernel a set of places it could break the batch without too much effort. I think we are better just giving the mesa driver a limit and when it hits that limit it submits the buffer. The kernel can give it a new optimal limit at any point and it should use that as soon as possible. Nothing can solve Ians problems where the app gives you a single working set that is too large at least with current GL. However you have to deal with the fact that batchbuffer has many operations and the total working set needs to fit in RAM to be relocated. I've added all the hooks in dri_bufmgr.c for non-TTM case, TTM shouldn't be a major effort to add. My understanding of future hw is that we are heading to virtualized GPU memory (IRQ assistance for page fault). I think we'll have this for r700, not sure i965 does this, r500 has I think per-process GART. I don't think you can restart i9xx after a pagefault, may be wrong... Note per-process GART != support for virtualized memory, though it gets you one step of the way. You also need the support so the kernel can figure out what page needs to be swapped in, be able to restart the GPU after the pagefault, etc, and probably some way to have the hardware go off and do something useful on another context in the meantime. I'd like to just try and get buffer based memory management working well first, then draw a line under that and work on these more advanced concepts... Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM vs GEM discussion questions
It's not clear to me which of the above the r300 nv people are aiming at, but in my opinion the latter is such a significant departure from what we have been thinking about that I have always believed it should be addressed by a new set of interfaces. My understanding of future hw is that we are heading to virtualized GPU memory (IRQ assistance for page fault). Yes, of course. This is the vista advanced scheduler and I guess it will be enforced by whql or some other mandatory scheme. Here's a post from 2006 that lays out the concepts: http://blogs.msdn.com/greg_schechter/archive/2006/04/02/566767.aspx The graphics rumour sites suggest that one or more of the IHVs failed to achieve this for the vista deadlines, so it might be a bit of a tough technical problem... My belief is that there are two different problems - buffer based memory managent and page-based virtualized GPU memory, and they should be solved with different implementations and probably different interfaces. Moreover, we should try and get a workable buffer-based scheme for current hardware and then commence navel-gazing to support future cards... delaying an adequate buffer-based memory manager (ttm+cleaner-interface or gem+performance-fixes) to wait for a page-based one doesn't make any sense as the page-based one won't ever work on current cards. The opposite is true, however -- a decent set of buffer-based interfaces will keep working for a long time, giving breathing room to create a page-baed manager later. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
i915 performance, master, i915tex gem
Just reposting this with a new subject line and less preamble. - Original Message Well the thing is I can't believe we don't know enough to do this in some way generically, but maybe the TTM vs GEM thing proves its not possible. I don't think there's anything particularly wrong with the GEM interface -- I just need to know that the implementation can be fixed so that performance doesn't suck as hard as it does in the current one, and that people's political views on basic operations like mapping buffers don't get in the way of writing a decent driver. We've run a few benchmarks against i915 drivers in all their permutations, and to summarize the results look like: - for GPU-bound apps, there are small differences, perhaps up to 10%. I'm really not concerned about these (yet). - for CPU-bound apps, the overheads introduced by Intel's approach to buffer handling impose a significant penalty in the region of 50-100%. I think the latter is the significant result -- none of these experiments in memory management significantly change the command stream the hardware has to operate on, so what we're varying essentially is the CPU behaviour to acheive that command stream. And it is in CPU usage where GEM (and Keith/Eric's now-abandoned TTM driver) do significantly dissapoint. Or to put it another way, GEM master/TTM seem to burn huge amounts of CPU just running the memory manager. This isn't true for master/no-ttm or for i915tex using userspace sub-allocation, where the CPU penalty for getting decent memory management seems to be minimal relative to the non-ttm baseline. If there's a political desire to not use userspace sub-allocation, then whatever kernel-based approach you want to investigate should nonetheless make some effort to hit reasonable performance goals -- and neither of the current two kernel-allocation-based approaches currently are at all impressive. Keith == And on an i945G, dual core Pentium D 3Ghz 2MB cache, FSB 800 Mhz, single-channel ram: Openarena timedemo at 640x480: master w/o TTM: 840 frames, 17.1 seconds: 49.0 fps, 12.24s user 1.02s system 63% cpu 20.880 total master with TTM: 840 frames, 15.8 seconds: 53.1 fps, 13.51s user 5.15s system 95% cpu 19.571 total i915tex_branch: 840 frames, 13.8 seconds: 61.0 fps, 12.54s user 2.34s system 85% cpu 17.506 total gem: 840 frames, 15.9 seconds: 52.8 fps, 11.96s user 4.44s system 83% cpu 19.695 total KW: It's less obvious here than some of the tests below, but the pattern is still clear -- compared to master/no-ttm i915tex is getting about the same ratio of fps to CPU usage, whereas both master/ttm and gem are significantly worse, burning much more CPU per fps, with a large chunk of the extra CPU being spent in the kernel. The particularly worrying thing about GEM is that it isn't hitting *either* 100% cpu *or* maximum framerates from the hardware -- that's really not very good, as it implies hardware is being left idle unecessarily. glxgears: A: ~1029 fps, 20.63user 2.88system 1:00.00elapsed 39%CPU (master, no ttm) B: ~1072 fps, 23.97user 18.06system 1:00.00elapsed 70%CPU (master, ttm) C: ~1128 fps, 22.38user 5.21system 1:00.00elapsed 45%CPU (i915tex, new) D: ~1167 fps, 23.14user 9.07system 1:00.00elapsed 53%CPU (i915tex, old) F: ~1112 fps, 24.70user 21.95system 1:00.00elapsed 77%CPU (gem) KW: The high CPU overhead imposed by GEM and (non-suballocating) master/TTM should be pretty clear here. master/TTM burns 30% of CPU just running the memory manager!! GEM gets slightly higher framerates but uses even more CPU than master/TTM. fgl_glxgears -fbo: A: n/a B: ~244 fps, 7.03user 5.30system 1:00.01elapsed 20%CPU (master, ttm) C: ~255 fps, 6.24user 1.71system 1:00.00elapsed 13%CPU (i915tex, new) D: ~260 fps, 6.60user 2.44system 1:00.00elapsed 15%CPU (i915tex, old) F: ~258 fps, 7.56user 6.44system 1:00.00elapsed 23%CPU (gem) KW: GEM master/ttm burn more cpu to build/submit the same command streams. openarena 1280x1024: A: 840 frames, 44.5 seconds: 18.9 fps (master, no ttm) B: 840 frames, 40.8 seconds: 20.6 fps (master, ttm) C: 840 frames, 40.4 seconds: 20.8 fps (i915tex, new) D: 840 frames, 37.9 seconds: 22.2 fps (i915tex, old) F: 840 frames, 40.3 seconds: 20.8 fps (gem) KW: no cpu measurements taken here, but almost certainly GPU bound. A lot of similar numbers, I don't believe the deltas have anything in particular to do with memory management interface choices... ipers: A: ~285000 Poly/sec (master, no ttm) B: ~217000 Poly/sec (master, ttm) C: ~298000 Poly/sec (i915tex, new) D: ~227000 Poly/sec (i915tex, old) F: ~125000 Poly/sec (gen, GPU lockup on first attempt) KW: no cpu measurements in this run, but all are almost certainly 100% pinned on CPU. - i915tex (in particular i915tex, new) show similar performance to classic - ie low cpu
Re: TTM vs GEM discussion questions
On Mon, May 19, 2008 at 2:06 PM, Jerome Glisse [EMAIL PROTECTED] wrote: On Mon, 19 May 2008 12:16:57 +0100 (IST) Dave Airlie [EMAIL PROTECTED] wrote: For radeon the plan was to return error from superioctl as during superioctl and validation i do know if there is enough gart/vram to do the things. Then i think it's up to upper level to properly handle such failure from superioctl You really want to work this out in advance, at superioctl stage it is too late, have a look at the changes I made to the dri_bufmgr.c classic memory manager case to deal with this for Intel hw, if you got to superioctl and failed unwrapping would be a real pain in the ass, you might have a number of pieces of app state you can't reconstruct. I think DirectX handled this with cut-points where with the buffer you passed the kernel a set of places it could break the batch without too much effort. I think we are better just giving the mesa driver a limit and when it hits that limit it submits the buffer. The kernel can give it a new optimal limit at any point and it should use that as soon as possible. Nothing can solve Ians problems where the app gives you a single working set that is too large at least with current GL. However you have to deal with the fact that batchbuffer has many operations and the total working set needs to fit in RAM to be relocated. I've added all the hooks in dri_bufmgr.c for non-TTM case, TTM shouldn't be a major effort to add. Spliting the cmd before they get submited is the way to go, likely we can ask the kernel for estimate of available memory and so userspace can stop building cmd stream but this isn't easy. Well anyway this would be a userspace problem. Anyway we still will have to fail in superioctl if for instance memory fragmentation get in the way. It's as good as we can do... We need more than an estimate, of course -- if the estimate is optimistic, then you're back in the same situation - trying to deal with it after the fact.. For userspace splitting to work, there needs to be a memory number given by the kernel which is a *guarantee* that this amount of vram (or equivalent) is available, and that as long as userspace sticks within that, the kernel must guarantee that commands submitted will run... Unfortunately, this doesn't interact well with the pinning of buffers, eg for scanout, which may be happening asynchronously in other processes/threads. There are some possibilities to have pinning co-exist with these guarantees, eg by partitioning VRAM into a pinnable pool and a 'normal' pool, and using the size of the 'normal' pool as the guaranteed vram number. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915 performance, master, i915tex gem
glxgears uses 40% of the CPU in both classic and gem. Note that the gem version takes about 20 seconds to reach a steady state -- the gem driver isn't clearing the gtt actively and so glxgears gets far ahead of the gpu. My theory is that this shows that using cache-aware copies from a single static batch buffer (as gem does now) improves cache performance and write bandwidth. I'm still confused by your test setup... Stepping back from cache metaphysics, why doesn't classic pin the hardware, if it's still got 60% cpu to burn? I think getting reproducible results makes a lot of sense. What hardware are you actually using -- ie. what is this laptop? Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM merging?
I do worry that TTM is not Linux enough, it seems you have decided that we can never do in-kernel allocations at any useable speed and punted the work into userspace, which makes life easier for Gallium as its more like what Windows does, but I'm not sure this is a good solution for Linux. I have no idea where this set of ideas come from, and it's a little disturbing to me. On a couple of levels, it's clearly bogus. Firstly, TTM and its libdrm interfaces predate gallium by years. Secondly, the windows work we've done with gallium to date has been on XP and _entirely_ in kernel space, so the whole issue of user/kernel allocation strategies never came up. Thirdly, Gallium's backend interfaces are all about abstracting away from the OS, so that drivers can be picked up and dumped down in multiple places. It's ludicrous to suggest that the act of abstracting away from TTM has in itself skewed TTM -- the point is that the driver has been made independent of TTM. The point of Gallium is that it should work on top of *anything* -- if we had had to skew TTM in some way to achieve that, then we would have already failed right at the starting point... Lastly, and most importantly, I believe that using TTM kernel allocations to back a user space sub-allocator *is the right strategy*. This has nothing to do with Gallium. No matter how fast you make a kernel allocator (and I applaud efforts to make it fast), it is always going to be quicker to do allocations locally. This is the reason we have malloc() and not just mmap() or brk/sbrk. Also, sub-allocation doesn't imply massive preallocation. That bug is well fixed by Thomas' user-space slab allocator code. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: TTM merging?
- Original Message From: Jerome Glisse [EMAIL PROTECTED] To: Thomas Hellström [EMAIL PROTECTED] Cc: Dave Airlie [EMAIL PROTECTED]; Keith Packard [EMAIL PROTECTED]; DRI dri-devel@lists.sourceforge.net; Dave Airlie [EMAIL PROTECTED] Sent: Wednesday, May 14, 2008 6:08:55 PM Subject: Re: TTM merging? On Wed, 14 May 2008 16:36:54 +0200 Thomas Hellström wrote: Jerome Glisse wrote: I don't agree with you here. EXA is much faster for small composite operations and even small fill blits if fallbacks are used. Even to write-combined memory, but that of course depends on the hardware. This is going to be even more pronounced with acceleration architectures like Glucose and similar, that don't have an optimized path for small hardware composite operations. My personal feeling is that pwrites are a workaround for a workaround for a very bad decision: To avoid user-space allocators on device-mapped memory. This lead to a hack to avoid cahing-policy changes which lead to cache trashing problems which put us in the current situation. How far are we going to follow this path before people wake up? What's wrong with the performance of good old i915tex which even beats classic i915 in many cases. Having to go through potentially (and even probably) paged-out memory to access buffers to make that are present in VRAM sounds like a very odd approach (to say the least) to me. Even if it's a single page and implementing per-page dirty checks for domain flushing isn't very appealing either. I don't have number or benchmark to check how fast pread/pwrite path might be in this use so i am just expressing my feeling which happen to just be to avoid vma tlb flush as most as we can. I got the feeling that kernel goes through numerous trick to avoid tlb flushing for a good reason and also i am pretty sure that with number of core keeping growing anythings that need cpu broad synchronization is to be avoided. Hopefully once i got decent amount of time to do benchmark with gem i will check out my theory. I think simple benchmark can be done on intel hw just return false in EXA prepare access to force use of download from screen, and in download from screen use pread then comparing benchmark of this hacked intel ddx with a normal one should already give some numbers. Why should we have to when we can do it right? Well my point was that mapping vram is not right, i am not saying that i know the truth. It's just a feeling based on my experiment with ttm and on the bar restriction stuff and others consideration of same kind. No. Gem can't coop with it. Let's say you have a 512M system with two 1G video cards, 4G swap space, and you want to fill both card's videoram with render-and-forget textures for whatever purpose. What happens? After you've generated the first say 300M, The system mysteriously starts to page, and when, after a a couple of minutes of crawling texture upload speeds, you're done, The system is using and have written almost 2G of swap. Now, you want to update the textures and expect fast texsubimage... So having a backing object that you have to access to get things into VRAM is not the way to go. The correct way to do this is to reserve, but not use swap space. Then you can start using it on suspend, provided that the swapping system is still up (which is has to be with the current GEM approach anyway). If pwrite is used in this case, it must not dirty any backing object pages. For normal desktop i don't expect VRAM amount RAM amount, people with 1Go VRAM are usually hard gamer with 4G of ram :). Also most object in 3d world are stored in memory, if program are not stupid and trust gl to keep their texture then you just have the usual ram copy and possibly a vram copy, so i don't see any waste in the normal use case. Of course we can always come up with crazy weird setup, but i am more interested in dealing well with average Joe than dealing mostly well with every use case. It's always been a big win to go to single-copy texturing. Textures tend to be large and nobody has so much memory that doubling up on textures has ever been appealing... And there are obvious use-cases like textured video where only having a single copy is a big performance. It certainly makes things easier for the driver to duplicate textures -- which is why all the old DRI drivers did it -- but it doesn't make it right... And the old DRI drivers also copped out on things like render-to-texture, etc, so whatever gains you make in simplicity by treating VRAM as a cache, some of those will be lost because you'll have to keep track of which one of the two copies of a texture is up-to-date, and you'll still have to preserve (modified) texture contents on eviction, which old DRI never had to. Ultimately it boils down to a choice between making your life easier as a
Re: fake bufmgr and aperture sizing.
The problem remains how to avoid this situation completely. I guess the drm driver can reserve a global safe aperture size, and communicate that to the 3D client, but the current TTM drivers don't deal with this situation. My first idea would probably be your first alternative. Flush and re-do the state-emit if the combined buffer size is larger than the safe aperture size. I think a dynamically sized safe aperture size that can be used per batch submission, is probably the best plan, this might also allow throttling in multi-app situations to help avoid thrashing, by reducing the per-app limits. For cards with per-process we could make it the size of the per-process aperture. The case where an app manages to submit a working set for a single operation that is larger than the GPU can deal with, should be considered a bug in the driver I suppose. The trouble with the safe limit is that it can change in a timeframe that is inconvenient for the driver -- ie, if it changes when a driver has already constructed most of a scene, what happens? This is a lot like the old cliprect problem, where driver choices can be invalidated later on, leaving it in a difficult position. Trying to chop an already-constructed command stream up after the fact is unappealing, even on simple architectures like the i915 in classic mode. Add zone rendering or some other wrinkle it looses appeal fast. What about two limits -- hard soft? If the hard limit can avoid changing, that makes things a lot nicer for the driver. When the soft one changes, the driver can respect that next frame, but submit the current command stream as is. Keith - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Writing a state tracker for Gallium3D/SoftPipe
There are three components that you'll need: - state tracker -- which is API dependent - hw driver -- HW dependent (softpipe is an example), which implements the p_context.h interface. - winsys -- which is dependent on API, HW, OS, etc. The winsys is basically the glue that holds it all together. The intention is for it to be as small as possible and over time we'll improve the concept help make it smaller still. In Mesa/Gallium/DRI drivers, the winsys is the only component with an overall view of the structure of the driver, all the other components see only one aspect of it, but the Winsys is what puts all the pieces together, and provides the glue/services code to make it all work. At minimum, the winsys will implement the interface in p_winsys.h, which provides surface/buffer management functions to both the state tracker and hardware driver. In addition, the HW drivers each propose a command submission interface which is specific to the particular piece of hardware. As the winsys currently implements both these interfaces, it by definition becomes hardware specific -- though internally there is usually a separation between these pieces. Regarding the AUB stuff in the Xlib winsys, yes you can ignore that. It's a hack to get a simulator running without hardware - at some point I'll try and restructure things to make that clearer. What I'm guessing you want to know is how to break things down in your proposed state tracker. The overriding principle is to put as little in the winsys as possible. At first, it's clear that anything in p_winsys.h must be done in the winsys, similarly for whatever functionality the hw driver requests in it's backend interface - eg softpipe/sp_winsys.h. Beyond that, the winsys needs to implement some sort of 'create_screen' and 'create_context' functionality, but would ideally hand off as much as possible after that to shared code in the state tracker or HW driver. What's missing to some extent from the gallium interfaces is a fully-developed device/screen/global entity. Much of the work so far has been around the per-context entity (pipe_context), but the per-screen component that does surface management, etc, has been less well developed. That will change over time, but for the moment there's more of it in the winsys than I'd like. Keith - Original Message From: Younes M [EMAIL PROTECTED] To: dri-devel@lists.sourceforge.net Sent: Wednesday, April 16, 2008 6:47:48 PM Subject: Writing a state tracker for Gallium3D/SoftPipe I'm trying to get up and running writing a state tracker for libXvMC using SoftPipe, but looking at the Mesa src a few things are unclear to me. Looking at root/src/gallium/winsys/xlib I can't quite figure out where Mesa ends and where Gallium starts. It's my understanding that the winsys creates the pipe_context that I need, but is there a generic X winsys driver? The xm_* sources where softpipe_create() is eventually called (in xm_winsys.c) look very tied to Mesa, even though they appear to be in a generic directory, so do all state trackers have to implement something similar to use SoftPipe? Is this also the case for using hw drivers? Looking at the the aub src files and the dri directory it looks like that stuff is tied to the hw and does not have to be provided for each state tracker. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Gallium: Fix for tgsi_emit_sse2()
Sorry, this slipped through the net a little... Given how much is hardcoded with rtasm, I'd prefer to use a single calling convention everywhere, whether that's STDCALL or CDECL or something else I don't mind. Probably STDCALL because some compilers are too dumb to use anything else?? In which case, a little comment documentating what stdcall really means would help a lot - I've hit similar issues with the differences between calling conventions and it basically boiled down to disassembling gcc's output to figure out what we're supposed to be doing... If we switch to stdcall, there are a couple of other platform-specific varients in the generated code that can be removed. It's probably going to be the cleanest solution from the point of view of actually working on this code. Keith - Original Message From: Stephane Marchesin [EMAIL PROTECTED] To: Victor Stinner [EMAIL PROTECTED] Cc: dri-devel@lists.sourceforge.net Sent: Wednesday, April 2, 2008 12:18:33 PM Subject: Re: Gallium: Fix for tgsi_emit_sse2() So, we should really fix this. The two options are : - Keep a different calling convention under linux (cdecl by default, which requires saving esi by hand in the shader) and apply Victor's patch which saves restores this register - Use the same calling convention on all platforms, that is change include/pipe/p_compiler.h to define XSTDCALL to stdcall on linux, because for now it's empty, which is _not_ stdcall but cdecl. In any case, this is a serious issue as under linux esi gets corrupted on return from the SSE call. Which, of course, causes crashes. Stephane - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Gallium code reorganization
OK, I found I had to merge rather than rebase in order to get my changes into the new organization -- apologies for the bubble in the history. Keith José Fonseca wrote: Just to let you know that the first step, file shuffling, is finished. The rest will take more time but changes are less pervasive. Once you update any private branches to the new directory layout, you should be able to keep working as usual. Here's a quick summary of the changes you might need to do: - move your source files to the directory layout described below; - update the TOP dirs in your Makefiles; - update the include paths, replacing -I src/mesa/pipe to -I src/gallium/include -I src/gallium/drivers -I src/gallium/aux; - remove pipe/ prefix from from all includes *except* pipe/p_*.h includes. Jose On Thu, 2008-02-14 at 15:38 +0900, José Fonseca wrote: I'll dedicate some time now to reorganize gallium's code build process. This is stuff which has been discussed internally at TG several times, but this time I want to get it done. My objectives are: - leaner and more easy to understand/navigate source tree - reduce (or even eliminate) merges between private branches of the common gallium parts - help keep the gallium tree portable, by keeping things separate. My plan is: 1. Physically separate gallium source code from mesa code. This will be the final layout: - src/mesa - src/gallium - state_tracker - ogl - ... - drivers - i915simple - i965simple - cell - ... - winsys - dri - intel - ... - xlib - ... - aux - tgsi - draw - pipebuffer - llvm - cso_cache - ... i.e., give a subdir in src/gallium to each gallium architectural layer. 2. Eliminate mesa includes out of the gallium source code from everything but mesa's state_tracker (and eventually some winsys). 3. Using scons, enhance the build system to support all platforms we are interested (i.e., linux and win32, atm), 4. Teach the build system how to pick and build pipe/winsys drivers outside of the tree. Jose - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: redesigning the DRM internal logic..
Alex Deucher wrote: On Feb 13, 2008 9:09 PM, Keith Packard [EMAIL PROTECTED] wrote: On Wed, 2008-02-13 at 19:22 -0500, Alex Deucher wrote: How about a compat node for old clients and a new render node that handles both new clients and GPGPU? Then the backwards compat stuff could just be a shim layer and everything else could use the same code instead of dealing with separate render and gpgpu nodes. Recall that one of the goals is to support multiple user sessions running at the same time, so we really do want to have per-session 'devices' which relate the collection of applications running within that session and reflect the access permissions of the various objects and methods within that session. Any 'compat' node would eventually have to deal with this new environment, and I'm not sure it's entirely practical, nor do I think it entirely necessary. As for GPGPU usage, that would presumably look an awful lot like a separate session, although I can imagine there being further limits on precisely which operations a GPGPU application could perform. I guess I just don't see a difference between X/DRI rendering and GPGPU; it's just command submission. It seems like the only reason for the render/gpgpu split is for backwards compatibility. I think we need to differentiate between display and rendering rather than visual rendering and compute applications. Yes, though maybe GPGPU is just a convenient phrase for 'rendering facility divorced from display'. I'm not sure. There are real cases where you want to render images yet never have an interest in display - for example scientific visualization and gpu-accelerated offline rendering. From the point of view of the DRM, these should fall into the same bucket as GPGPU. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Mesa3d-dev] Gallium code reorganization
Michel Dänzer wrote: On Thu, 2008-02-14 at 20:05 +0900, José Fonseca wrote: On 2/14/08, Keith Whitwell [EMAIL PROTECTED] wrote: José Fonseca wrote: 1. Physically separate gallium source code from mesa code. This will be the final layout: - src/mesa - src/gallium - state_tracker - ogl - ... I think the one thing I'd say is that the GL state tracker is really a part of Mesa -- it's effectively a Mesa driver which targets the Gallium interfaces rather than some piece of hardware. Given that the gallium interface is fairly water-tight (ie. you can't reach round it to some driver internals) compared to the Mesa driver interface which is basically just include all the mesa internal headers, I think it will become clear if you try and do this that the state_tracker will sit pretty uncomfortably anywhere other than inside mesa... So src/mesa/driver/state_tracker then? src/mesa/driver/gallium ? The trouble with this is you now have two things in the stack which can be called 'the gallium driver', ie: GL API -- Core Mesa -- Gallium Driver (formerly State Tracker) -- Gallium API -- Gallium Driver -- Winsys, DRM, HW I'd be happy to either leave this piece out of the proposed changes for now, or to move it to mesa/drivers/state_tracker. Basically I think we have a clear idea what to do with the rest of the stack, probably we should just move ahead on that and either leave the Mesa state tracker alone or only make minimal changes to it. Leaving it out of this round of changes doesn't mean that we can't move/rename it later -- because it's a part of mesa, changing it later won't break or affect any other Gallium clients. It's really an internal matter for Mesa where that code lives what it's called. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Intel releases 965 documentation under CC license
Philipp Klaus Krause wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 It seems Intel has released complete documentation for the 965: http://intellinuxgraphics.com/documentation.html Hmm, some of the tables seem to be a bit messed up (presumably after the conversion to pdf). Still readable though definitely good to see... Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: RFC: render buffer
Jerome Glisse wrote: Hi all, There have been discussion on irc and elsewhere about the need or not of an object describing a render buffer (could be scan-out buffer or or other specialized card render buffer: color buffer, zbuffer, ...). This would mostly act as a wrapper around BO. Here is what i can imagine as an interface for this: - RenderCreate(properties) - RenderMapLinear - RenderMap - RenderUnmap - Renderunreference Properties would be a set of common properties like: - width - height - bit per pixel - pitch - scan out buffer And also driver private properties like: - compressed in this weird specific format - ... At creation you give a set of properties and you can't change them afterward (well i think it's good enough rules). For what this could be use full ? Well first it would be very useful for mode setting as we can then just have a place where constraint on scan out buffer memory layout are properly checked. Right now we just blindly trust user space to provide a big enough buffer. This could also be use full for creating render buffer making cmd checking a lot easier (as we would have access to width, height, pitch or whatever information is needed to make sure that we have a proper target for our rendering command. Also we could offer common interface for scan out buffer where the driver should allocate a proper buffer using only default properties like (width, height, bit per pixel) and it would be up to driver to fill in other properties with good safe default value (alignment, pitch, compressed, ...) I believe having render buffer object concept in kernel make sense for the above mentioned reasons and because in graphics world render buffer are a key concept and every things in end have to deal with it. So to sum-up: - easy checking of proper render buffer in kernel - easy checking of proper scan out buffer in kernel - make user space easier and safer (others things than a dri driver will allocate proper buffer without having to duplicate code). Few more words on map i think it could be use full to provide two different map method one linear specifically means that you want a linear mapping of the buffer (according to width, height, pitch and bpp) the others just ask for mapping and driver could pass some informations on the layout of the mapped memory (tiled, compressed, ...). For implementation is see two possible way: - wrap render buffer around BO - make render buffer a specialized BO by adding a flag into BO and ptr to render buffer structure Second solution likely the easier one. Anyway what are people thought on all that ? Pretty much every buffer is potentially a render target, for instance all texture buffers when generating mipmaps, etc. In the example above, different parts of individual buffers may be rendered to with different pitches, etc, ie when targetting different mipmaps. Intel hardware uses the same pitch for all mipmaps, but this is not universal. Furthermore things like GL's pixel buffers may be used with different pitches etc according to the user's whim. In general one of the nicest things about the current memory manager is that it does *not* impose this type of thing on regular buffer management. I've worked with systems that do and it can be very burdensome. It's not like this presents a security issue to the system at large, so the question then is why make it a kernel function? You just end up running into the limitations you've encoded into the kernel in generation n when you're trying to do the work for generation n+1. One motiviation for this sort of thing might be making allocation of fenced memory regions easier (fenced in the sense used in Intel HW, referring to tiled memory). I think that might be better handled specially without encumbering the system as a whole with a fixed interpretation of buffer layout. Is there a specific issue that this proposal is trying to address? Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: RFC: render buffer
Jerome Glisse wrote: On Wed, 2008-01-16 at 17:35 +, Keith Whitwell wrote: Pretty much every buffer is potentially a render target, for instance all texture buffers when generating mipmaps, etc. In the example above, different parts of individual buffers may be rendered to with different pitches, etc, ie when targetting different mipmaps. Intel hardware uses the same pitch for all mipmaps, but this is not universal. Furthermore things like GL's pixel buffers may be used with different pitches etc according to the user's whim. In general one of the nicest things about the current memory manager is that it does *not* impose this type of thing on regular buffer management. I've worked with systems that do and it can be very burdensome. It's not like this presents a security issue to the system at large, so the question then is why make it a kernel function? You just end up running into the limitations you've encoded into the kernel in generation n when you're trying to do the work for generation n+1. One motiviation for this sort of thing might be making allocation of fenced memory regions easier (fenced in the sense used in Intel HW, referring to tiled memory). I think that might be better handled specially without encumbering the system as a whole with a fixed interpretation of buffer layout. Is there a specific issue that this proposal is trying to address? Keith Well main motivation was for mode setting and command checking, for radeon a proper command checking will need to do a lot of, (width|pitch)*height*bpp + alignment against bo size checking. I do see render buffer object as a way of greatly simplify this. But i won't fight for it, i am well aware that current bo are really nice because it doesn't enforce a policy. I guess my main concern is more about how to ask to mode setting to program card to use this kind or this kind of layout for scan out buffer. Modesetting and scanout buffers are a different kettle of fish - it may be reasonable to have more policy there than we currently do, and I don't think that the negatives I'm worried about apply so much to this area. It's quite reasonable to expect that *somebody* in the display stack may have more information than the 3d client driver about the necessary format, layout, etc of a scanout buffer, and that information would be necessary in order to get eg. page flipping to work correctly. It *may* be that the memory manager/kernel module has a role to play in this -- I don't really know one way or another. I guess the argument is stronger when you're talking about cases where the drm module does modesetting itself. It should be possible to put together a proposal in this area that doesn't negatively affect the 3d driver's ability to use buffers as rendertargets in new innovative ways. I'm not sure what it would look like exactly, but I'd be happy to evaluate it in the above terms. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] Clean up and document drm_ttm.c APIs. drm_bind_ttm - drm_ttm_bind.
Keith Packard wrote: Here are some proposed cleanups and documentation for the drm_ttm.c APIs. One thing I didn't change was the name of drm_ttm_fixup_caching, which is clearly a badly named function. Can anyone explain why you wouldn't just always use drm_ttm_unbind instead? The only difference is that drm_ttm_unbind makes sure the object is evicted before flushing caches and marking it as unbound. Looks good Keith. There are a couple of places where you need s/flat/flag otherwise looking great. I can't help with the question above unfortunately... Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] Change drm_bo_type_dc to drm_bo_type_device and comment usage of this value.
Keith, This looks good to me too. Keith Keith Packard wrote: commit 9856a00ee5e6de30ba3040749583b2eafdf2dfc1 Author: Keith Packard [EMAIL PROTECTED] Date: Sun Dec 16 22:00:45 2007 -0800 Change drm_bo_type_dc to drm_bo_type_device and comment usage of this value. I couldn't figure out what drm_bo_type_dc was for; Dave Airlie finally clued me in that it was the 'normal' buffer objects with kernel allocated pages that could be mmapped from the drm device file. I thought that 'drm_bo_type_device' was a more descriptive name. I also added a bunch of comments describing the use of the type enum values and the functions that use them. diff --git a/linux-core/drm_bo.c b/linux-core/drm_bo.c index 171c074..df10e12 100644 --- a/linux-core/drm_bo.c +++ b/linux-core/drm_bo.c @@ -146,7 +146,7 @@ static int drm_bo_add_ttm(struct drm_buffer_object *bo) page_flags |= DRM_TTM_PAGE_WRITE; switch (bo-type) { - case drm_bo_type_dc: + case drm_bo_type_device: case drm_bo_type_kernel: bo-ttm = drm_ttm_create(dev, bo-num_pages PAGE_SHIFT, page_flags, dev-bm.dummy_read_page); @@ -1155,7 +1155,12 @@ static void drm_bo_fill_rep_arg(struct drm_buffer_object *bo, rep-size = bo-num_pages * PAGE_SIZE; rep-offset = bo-offset; - if (bo-type == drm_bo_type_dc) + /* + * drm_bo_type_device buffers have user-visible + * handles which can be used to share across + * processes. Hand that back to the application + */ + if (bo-type == drm_bo_type_device) rep-arg_handle = bo-map_list.user_token; else rep-arg_handle = 0; @@ -1786,7 +1791,12 @@ int drm_buffer_object_create(struct drm_device *dev, if (ret) goto out_err; - if (bo-type == drm_bo_type_dc) { + /* + * For drm_bo_type_device buffers, allocate + * address space from the device so that applications + * can mmap the buffer from there + */ + if (bo-type == drm_bo_type_device) { mutex_lock(dev-struct_mutex); ret = drm_bo_setup_vm_locked(bo); mutex_unlock(dev-struct_mutex); @@ -1849,7 +1859,12 @@ int drm_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *fil return -EINVAL; } - bo_type = (req-buffer_start) ? drm_bo_type_user : drm_bo_type_dc; + /* + * If the buffer creation request comes in with a starting address, + * that points at the desired user pages to map. Otherwise, create + * a drm_bo_type_device buffer, which uses pages allocated from the kernel + */ + bo_type = (req-buffer_start) ? drm_bo_type_user : drm_bo_type_device; /* * User buffers cannot be shared @@ -2607,6 +2622,14 @@ void drm_bo_unmap_virtual(struct drm_buffer_object *bo) unmap_mapping_range(dev-dev_mapping, offset, holelen, 1); } +/** + * drm_bo_takedown_vm_locked: + * + * @bo: the buffer object to remove any drm device mapping + * + * Remove any associated vm mapping on the drm device node that + * would have been created for a drm_bo_type_device buffer + */ static void drm_bo_takedown_vm_locked(struct drm_buffer_object *bo) { struct drm_map_list *list; @@ -2614,7 +2637,7 @@ static void drm_bo_takedown_vm_locked(struct drm_buffer_object *bo) struct drm_device *dev = bo-dev; DRM_ASSERT_LOCKED(dev-struct_mutex); - if (bo-type != drm_bo_type_dc) + if (bo-type != drm_bo_type_device) return; list = bo-map_list; @@ -2637,6 +2660,16 @@ static void drm_bo_takedown_vm_locked(struct drm_buffer_object *bo) drm_bo_usage_deref_locked(bo); } +/** + * drm_bo_setup_vm_locked: + * + * @bo: the buffer to allocate address space for + * + * Allocate address space in the drm device so that applications + * can mmap the buffer and access the contents. This only + * applies to drm_bo_type_device objects as others are not + * placed in the drm device address space. + */ static int drm_bo_setup_vm_locked(struct drm_buffer_object *bo) { struct drm_map_list *list = bo-map_list; diff --git a/linux-core/drm_objects.h b/linux-core/drm_objects.h index 98421e4..a2d10b5 100644 --- a/linux-core/drm_objects.h +++ b/linux-core/drm_objects.h @@ -404,9 +404,31 @@ struct drm_bo_mem_reg { }; enum drm_bo_type { - drm_bo_type_dc, + /* + * drm_bo_type_device are 'normal' drm allocations, + * pages are allocated from within the kernel automatically + * and the objects can be mmap'd from the drm device. Each + * drm_bo_type_device object has a unique name which can be + * used by other processes to share access to the underlying + * buffer. + */ + drm_bo_type_device, + /* + *
Re: [PATCH] Rename inappropriately named 'mask' fields to 'proposed_flags' instead.
Keith Packard wrote: commit 32acf53eefa64cd41cc9bf45705b0825fc8a0eef Author: Keith Packard [EMAIL PROTECTED] Date: Sun Dec 16 20:16:50 2007 -0800 Rename inappropriately named 'mask' fields to 'proposed_flags' instead. Flags pending validation were stored in a misleadingly named field, 'mask'. As 'mask' is already used to indicate pieces of a flags field which are changing, it seems better to use a name reflecting the actual purpose of this field. I chose 'proposed_flags' as they may not actually end up in 'flags', and in an case will be modified when they are moved over. This affects the API, but not ABI of the user-mode interface. Keith, I think this makes sense too. I'm hopeful Thomas would agree. +/* + * drm_bo_propose_flags: + * + * @bo: the buffer object getting new flags + * + * @new_flags: the new set of proposed flag bits + * + * @new_mask: the mask of bits changed in new_flags + * + * Modify the proposed_flag bits in @bo + */ Looks like this comment has already started to drift from the function it is documenting?? +static int drm_bo_modify_proposed_flags (struct drm_buffer_object *bo, + uint64_t new_flags, uint64_t new_mask) Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Proposal for a few minor internal API changes.
Keith Packard wrote: I'm writing up some documentation for internal DRM interfaces and came across a couple of interface inconsistencies that seem like they should get fixed before they start getting used a lot more. If these look like good changes, I'll continue to search out other similar issues. I'll just include the header changes in this message. Make ttm create/destroy APIs consistent. Pass page_flags in create. Creating a ttm was done with drm_ttm_init while destruction was done with drm_destroy_ttm. Renaming these to drm_ttm_create and drm_ttm_destroy makes their use clearer. Passing page_flags to the create function will allow that to know whether user or kernel pages are needed, with the goal of allowing kernel ttms to be saved for later reuse. --- linux-core/drm_objects.h --- index 1dc61fd..66611f6 100644 @@ -297,7 +297,7 @@ struct drm_ttm { }; -extern struct drm_ttm *drm_ttm_init(struct drm_device *dev, unsigned long size); +extern struct drm_ttm *drm_ttm_create(struct drm_device *dev, unsigned long size, uint32_t page_flags); extern int drm_bind_ttm(struct drm_ttm *ttm, struct drm_bo_mem_reg *bo_mem); extern void drm_ttm_unbind(struct drm_ttm *ttm); extern void drm_ttm_evict(struct drm_ttm *ttm); @@ -318,7 +318,7 @@ extern int drm_ttm_set_user(struct drm_ttm *ttm, * Otherwise it is called when the last vma exits. */ -extern int drm_destroy_ttm(struct drm_ttm *ttm); +extern int drm_ttm_destroy(struct drm_ttm *ttm); #define DRM_FLAG_MASKED(_old, _new, _mask) {\ (_old) ^= (((_old) ^ (_new)) (_mask)); \ Document drm_bo_do_validate. Remove spurious 'do_wait' parameter. Add comments about the parameters to drm_bo_do_validate, along with comments for the DRM_BO_HINT options. Remove the 'do_wait' parameter as it is duplicated by DRM_BO_HINT_DONT_BLOCK. --- linux-core/drm_objects.h --- index 66611f6..1c6ca79 100644 @@ -546,7 +546,6 @@ extern struct drm_buffer_object *drm_lookup_buffer_object(struct drm_file *file_ extern int drm_bo_do_validate(struct drm_buffer_object *bo, uint64_t flags, uint64_t mask, uint32_t hint, uint32_t fence_class, - int no_wait, struct drm_bo_info_rep *rep); /* Document drm_bo_handle_validate. Match drm_bo_do_validate parameter order. Document parameters and usage for drm_bo_handle_validate. Change parameter order to match drm_bo_do_validate (fence_class has been moved to after flags, hint and mask values). Existing users of this function have been changed, but out-of-tree users must be modified separately. --- linux-core/drm_objects.h --- index 1c6ca79..0926b47 100644 @@ -535,9 +535,8 @@ extern int drm_bo_clean_mm(struct drm_device *dev, unsigned mem_type); extern int drm_bo_init_mm(struct drm_device *dev, unsigned type, unsigned long p_offset, unsigned long p_size); extern int drm_bo_handle_validate(struct drm_file *file_priv, uint32_t handle, - uint32_t fence_class, uint64_t flags, - uint64_t mask, uint32_t hint, - int use_old_fence_class, + uint64_t flags, uint64_t mask, uint32_t hint, + uint32_t fence_class, int use_old_fence_class, struct drm_bo_info_rep *rep, struct drm_buffer_object **bo_rep); These all look sensible. It's a pity that the change above looks like it will allow users of the old argument order to continue to compile without error despite the change. It's a bit hard to know how to achieve that though. When you say 'document xyz', and the documentation doesn't appear in the patch to the header, where *will* the documentation live?? Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Proposal for a few minor internal API changes.
Keith Packard wrote: On Sat, 2007-12-15 at 10:59 -0700, Brian Paul wrote: Could a temporary/dummy parameter be added for a while? Callers that weren't updated would get an error/warning about too few arguments. Then remove the dummy at some point in the future. We could change the use_old_fence_class into a HINT bit, that would reduce the number of parameters by one and cause compile errors for existing code. I'd rather not intentionally damage the API temporarily though; that seems fairly harsh. Ultimately it's not that big of a deal - if this change makes sense on it's own, then sure go ahead. Otherwise it's only Poulsbo that I can think of being out-of-tree, and we should be able to figure out what's going on there fairly quickly (though obviously we'll forget all about this conversation until after the next merge starts making it behave weirdly). Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory manager, sub allocator partial validation
Jerome Glisse wrote: Hi, I am wondering if allowing user to ask for partial validation (ie only validate a part of a bo object) might be usefull in case of userspace sub allocator and likely in others case when you know for instance that you only need to access a small part of a bo. Such partial things could also be usefull for mapping a bo asking only to map a part of it. I am throwing the idea, i haven't yet enough code to test whether this kind of optimization might be worthfull for radeon hw (and possibly others) I think if you want this, the way to get it is to abandon the userspace suballocator and use kernel buffers directly. To have the kernel manage partial buffers doesn't make sense - if this is what you want, tell the kernel about the little buffers and let it manage them directly. Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915: wait for buffer idle before writing relocations
Keith Packard wrote: On Fri, 2007-12-07 at 11:15 +, Keith Whitwell wrote: Keith, Thomas has just left for two weeks of (well deserved!) holiday, so he may be slow to respond. Thanks for taking the time to have a look while he's away; we're finishing up the 965 TTM work, and it is posing some challenges with the existing kernel interface. In the meantime, have you considered how this will interact with userspace buffer pools? No, I hadn't considered that as we're not considering a two-level allocation strategy at this point. However, if you consider the blocking patch in conjunction with the presumed_offset optimization, I think you'll find that userspace buffer pools will not actually be affected negatively by this change. The presumed_offset optimization allows the application to compute all relocations itself for target buffers which have been mapped to the hardware. The kernel relocations are purely a back-up, for cases where buffers move between EXECBUFFER invocations. I know you guys aren't using them at this point, but I'm of the opinion that they are an important facility which needs to be preserved. At worst it may be that some additional flag is needed to control this behaviour. We could do this, but I believe this would actually require more blocking by the client -- it doesn't know when objects are moving in the kernel, so it doesn't know when relocation data will need to be rewritten. Secondly I wonder whether this isn't already caught by other aspects of the buffer manager behaviour? ie, if the buffer to which the relocation points to is being moved, doesn't that imply all hardware activity related to that buffer must have concluded? IE, if the buffer itself is free to move, surely all commands containing relocations (or chains of relocations) which point to the buffer must themselves have completed?? Yes, if the target buffer is moving, then the operation related to the relocatee will have been completed and waited for. But, re-writing relocations doesn't require that the buffers have moved. Consider the case of the binding table on 965 which points at surface state structures. Executing a command that uses the binding table will require that relocations be evaluated for the entries in the table; even if nothing moves (ignoring my presumed_offset optimization), those relocations will need to be evaluated and the surface state pointers stored to the binding table. For the application to guarantee that the binding table relocations can be written without the kernel needing to wait for the binding table buffer to be idle, the application would have to wait every time, not just when the buffer actually moves. OK, it sounds like you're talking about situations where the driver is modifying state in buffers *only* through changes to the relocations? It's probably not surprising the fence is not implemented as I'd normally think that those relocation changes would be associated with some changes to the other data, and that would imply mapping the buffer (and hence the wait). I do understand the examples though and can see where you're trying to take this. Anyway, I'm hopeful that this won't break other usages... Keith - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915: wait for buffer idle before writing relocations
Yeah, I'm pretty interested to come up with an 'append' type of semantic for buffer usage, particularly for things like the state pools you guys are probably playing with at the moment. It's not something that's ever going to be a *requirement* for a driver, and may not necessarily be a big win or even particularly difficult, but at this point nobody's really dug into it enough to know one way or another. Ignoring relocation issues, an 'append' mapping semantic, as opposed to the existing read/write maps, is probably an interesting concept also as it could allow mapping a state pool buffer to add new states as required by the application, but not require a fence as the old ones won't be interfered with. Keith - Original Message From: Keith Packard [EMAIL PROTECTED] To: Keith Whitwell [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; dri-devel dri-devel@lists.sourceforge.net Sent: Monday, December 10, 2007 4:44:44 PM Subject: Re: i915: wait for buffer idle before writing relocations [...] I think the interesting usage that you point out is where the application knows that a wait isn't necessary as the previously referenced data will not be re-used, and only new portions of the buffer need relocations. Given the choice between avoiding waits for cases we have today vs avoiding waits for cases we may try in the future, it seems reasonable to solve what we're using now. -- [EMAIL PROTECTED] - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915: wait for buffer idle before writing relocations
Keith, Thomas has just left for two weeks of (well deserved!) holiday, so he may be slow to respond. In the meantime, have you considered how this will interact with userspace buffer pools? I know you guys aren't using them at this point, but I'm of the opinion that they are an important facility which needs to be preserved. At worst it may be that some additional flag is needed to control this behaviour. Secondly I wonder whether this isn't already caught by other aspects of the buffer manager behaviour? ie, if the buffer to which the relocation points to is being moved, doesn't that imply all hardware activity related to that buffer must have concluded? IE, if the buffer itself is free to move, surely all commands containing relocations (or chains of relocations) which point to the buffer must themselves have completed?? Keith Keith Packard wrote: Here's a patch I believe is necessary for the i915 DRM kernel driver; right now, the i915 mesa driver never re-uses batch buffers, so there can never be an outstanding fence for a buffer with relocations. On 965, buffers other than the batch buffer will contain relocations, and may be reused (we'll avoid this because of the performance costs). In any case, this is a correctness fix, as the kernel must not presume that user space isn't reusing buffers with relocations. commit 6f5816b45d62c5b29eb6997885f103c21c92bed1 Author: Keith Packard [EMAIL PROTECTED] Date: Thu Dec 6 15:12:21 2007 -0800 i915: wait for buffer idle before writing relocations When writing a relocation entry, make sure the target buffer is idle, otherwise the GPU may see inconsistent data. diff --git a/shared-core/i915_dma.c b/shared-core/i915_dma.c index 8791af6..42a2216 100644 --- a/shared-core/i915_dma.c +++ b/shared-core/i915_dma.c @@ -756,6 +756,13 @@ int i915_apply_reloc(struct drm_file *file_priv, int num_buffers, !drm_bo_same_page(relocatee-offset, new_cmd_offset)) { drm_bo_kunmap(relocatee-kmap); relocatee-offset = new_cmd_offset; + mutex_lock (relocatee-buf-mutex); + ret = drm_bo_wait (relocatee-buf, 0, 0, FALSE); + mutex_unlock (relocatee-buf-mutex); + if (ret) { + DRM_ERROR(Could not wait for buffer to apply relocs\n %08lx, new_cmd_offset); + return ret; + } ret = drm_bo_kmap(relocatee-buf, new_cmd_offset PAGE_SHIFT, 1, relocatee-kmap); if (ret) { - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Kristian Høgsberg wrote: On Nov 27, 2007 2:30 PM, Keith Packard [EMAIL PROTECTED] wrote: ... I both cases, the kernel will need to know how to activate a given context and the context handle should be part of the super ioctl arguments. We needn't expose the contexts to user-space, just provide a virtual consistent device and manage contexts in the kernel. We could add the ability to manage contexts from user space for cases where that makes sense (like, perhaps, in the X server where a context per client may be useful). Oh, right we don't need one per GLContext, just per DRI client, mesa handles switching between GL contexts. What about multithreaded rendering sharing the same drm fd? I imagine one optimiation you could do with a fixed number of contexts is to assume that loosing the context will be very rare, and just fail the super-ioctl when it happens, and then expect user space to resubmit with state emission preamble. In fact it may work well for single context hardware... I recall having the same discussion in the past; have the superioctl fail so that the client needn't constantly compute the full state restore on every submission may be a performance win for some hardware. All that this requires is a flag to the kernel that says 'this submission reinitializes the hardware', and an error return that says 'lost context'. Exactly. But the super-ioctl is chipset specific and we can decide on the details there on a chipset to chipset basis. If you have input to how the super-ioctl for intel should look like to support lockless operation for current and future intel chipset, I'd love to hear it. And of course we can version our way out of trouble as a last resort. We should do the lockless and context stuff at the same time; doing re-submit would be a bunch of work in the driver that would be wasted. Is it that bad? We will still need the resubmit code for older chipsets that dont have the hardware context support. The drivers already have the code to emit state in case of context loss, that code just needs to be repurposed to generate a batch buffer to prepend to the rendering code. And the rendering code doesn't need to change when resubmitting. Will that work? Right now, we're just trying to get 965 running with ttm; once that's limping along, figuring out how to make it reasonable will be the next task, and hardware context support is clearly a big part of that. Yeah - I'm trying to limit the scope of DRI2 so that we can have redirected direct rendering and GLX1.4 in the tree sooner rather than later (before the end of the year). Maybe the best way to do that is to keep the lock around for now and punt on the lock-less super-ioctl if that really blocks on hardware context support. How far back is hardware contexts supported? There are three ways to support lockless operation - hardware contexts - a full preamble emit per batchbuffer - passing a pair of preamble, payload batchbuffers per ioctl I think all hardware is capable of supporting at least one of these. That said, if the super-ioctl is per-device, then you can make a choice per-device in terms of whether the lock is required or not, which makes things easier. The reality is that most ttm based drivers will do very little differently on a regular lock compared to a contended one -- at most they could decide whether or not to emit a preamble they computed earlier. Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Kristian Høgsberg wrote: Another problem is that it's not just swapbuffer - anything that touches the front buffer have to respect the cliprects - glCopyPixels and glXCopySubBufferMESA - and thus have to happen in the kernel. These don't touch the real swapbuffer, just the fake one. Hence they don't care about cliprects and certainly don't have to happen in the kernel... Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Michel Dänzer wrote: On Wed, 2007-11-28 at 09:30 +, Keith Whitwell wrote: Kristian Høgsberg wrote: Another problem is that it's not just swapbuffer - anything that touches the front buffer have to respect the cliprects - glCopyPixels and glXCopySubBufferMESA - and thus have to happen in the kernel. These don't touch the real swapbuffer, just the fake one. Hence they don't care about cliprects and certainly don't have to happen in the kernel... I'm not sure about glCopyPixels, but glXCopySubBufferMESA would most definitely be useless if it didn't copy to the real frontbuffer. Yes, wasn't paying attention... glxCopySubBufferMESA would do both - copy to the fake front buffer and then trigger a damage-induced update of the real frontbuffer. Neither operation requires the 3d driver know about cliprects, and the damage operation is basically a generalization of the swapbuffer stuff we've been talking about. Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Swapbuffers [was: Re: DRI2 and lock-less operation]
Kristian Høgsberg wrote: On Nov 27, 2007 11:48 AM, Stephane Marchesin [EMAIL PROTECTED] wrote: On 11/22/07, Kristian Høgsberg [EMAIL PROTECTED] wrote: ... It's all delightfully simple, but I'm starting to reconsider whether the lockless bullet point is realistic. Note, the drawable lock is gone, we always render to private back buffers and do swap buffers in the kernel, so I'm only concerned with the DRI lock here. The idea is that since we have the memory manager and the super-ioctl and the X server now can push cliprects into the kernel in one atomic operation, we would be able to get rid of the DRI lock. My overall question, here is, is that feasible? How do you plan to ensure that X didn't change the cliprects after you emitted them to the DRM ? The idea was that the buffer swap happens in the kernel, triggered by an ioctl. The kernel generates the command stream to execute the swap against the current set of cliprects. The back buffers are always private so the cliprects only come into play when copying from the back buffer to the front buffer. Single buffered visuals are secretly double buffered and implemented the same way. I'm trying to figure now whether it makes more sense to keep cliprects and swapbuffer out of the kernel, which wouldn't change the above much, except the swapbuffer case. I described the idea for swapbuffer in this case in my reply to Thomas: the X server publishes cliprects to the clients through a shared ring buffer, and clients parse the clip rect changes out of this buffer as they need it. When posting a swap buffer request, the buffer head should be included in the super-ioctl so that the kernel can reject stale requests. When that happens, the client must parse the new cliprect info and resubmit an updated swap buffer request. In my ideal world, the entity which knows and cares about cliprects should be the one that does the swapbuffers, or at least is in control of the process. That entity is the X server. Instead of tying ourselves into knots trying to figure out how to get some other entity a sufficiently up-to-date set of cliprects to make this work (which is what was wrong with DRI 1.0), maybe we should try and figure out how to get the X server to efficiently orchestrate swapbuffers. In particular it seems like we have: 1) The X server knows about cliprects. 2) The kernel knows about IRQ reception. 3) The kernel knows how to submit rendering commands to hardware. 4) Userspace is where we want to craft rendering commands. Given the above, what do we think about swapbuffers: a) Swapbuffers is a rendering command b) which depends on cliprect information c) that needs to be fired as soon as possible after an IRQ receipt. So: swapbuffers should be crafted from userspace (a, 4) ... by the X server (b, 1) ... and should be actually fired by the kernel (c, 2, 3) I propose something like: 0) 3D client submits rendering to the kernel and receives back a fence. 1) 3D client wants to do swapbuffers. It sends a message to the X server asking it please do me a swapbuffers after this fence has completed. 2) X server crafts (somehow) commands implementing swapbuffers for this drawable under the current set of cliprects and passes it to the kernel along with the fence. 3) The kernel keeps that batchbuffer to the side until a) the commands associated with the fence have been submitted to hardware. b) the next vblank IRQ arrives. when both of these are true, the kernel simply submits the prepared swapbuffer commands through the lowest latency path to hardware. But what happens if the cliprects change? The 100% perfect solution looks like: The X server knows all about cliprect changes, and can use fences or other mechanisms to keep track of which swapbuffers are outstanding. At the time of a cliprect change, it must create new swapbuffer commandsets for all pending swapbuffers and re-submit those to the kernel. These new sets of commands must be tied to the progress of the X server's own rendering command stream so that the kernel fires the appropriate one to land the swapbuffers to the correct destination as the X server's own rendering flies by. In the simplest case, where the kernel puts commands onto the one true ring as it receives them, the kernel can simply discard the old swapbuffer command. Indeed this is true also if the kernel has a ring-per-context and uses one of those rings to serialize the X server rendering and swapbuffers commands. Note that condition 3a) above is always true in the current i915.o one-true-ring/single-fifo approach to hardware serialization. I think the above can work and seems more straight-forward than many of the proposed alternatives. Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from
Re: Clip Lists
Stephane Marchesin wrote: On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Stephane Marchesin [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] writes: I fail to see how this works with a lockless design. How do you ensure the X server doesn't change cliprects between the time it has written those in the shared ring buffer and the time the DRI client picks them up and has the command fired and actually executed ? Do you lock out the server during that time ? The scheme I have been advocating is this: - A new extension is added to the X server, with a PixmapFromBufferObject request. - Clients render into a private back buffer object, for which they used the new extension to generate a pixmap. - When a client wishes to copy something to the frontbuffer (for whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses plain old XCopyArea() with the generated pixmap. The X server is then responsible for any clipping necessary. This scheme puts all clip list management in the X server. No cliprects in shared memory or in the kernel would be required. And no locking is required since the X server is already processing requests in sequence. Yes, that is the idea I want to do for nvidia hardware. Although I'm not sure if we can/want to implement it in term of X primitives or a new X extension. To synchronize with vblank, a new SYNC counter is introduced that records the number of vblanks since some time in the past. The clients can then issue SyncAwait requests before any copy they want synchronized with vblank. This allows the client to do useful processing while it waits, which I don't believe is the case now. Since we can put a wait until vblank on crtc #X command to a fifo on nvidia hardware, the vblank issue is non-existent for us. We get precise vblank without CPU intervention. You still have some issues... The choice is: do you put the wait-until-vblank command in the same fifo as the X server rendering or not? If yes -- you end up with nasty latency for X as its rendering is blocked by swapbuffers. If no -- you face the question of what to do when cliprects change. The only way to make 'no' work is to effectively block the X server from changing cliprects while such a command is outstanding -- which leads you back to latency issues - probably juddery window moves when 3d is active. I don't think hardware gives you a way out of jail for swapbuffers in the presence of changing cliprects. Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Clip Lists
Stephane Marchesin wrote: On 11/28/07, *Keith Whitwell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Stephane Marchesin wrote: On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Stephane Marchesin [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] writes: I fail to see how this works with a lockless design. How do you ensure the X server doesn't change cliprects between the time it has written those in the shared ring buffer and the time the DRI client picks them up and has the command fired and actually executed ? Do you lock out the server during that time ? The scheme I have been advocating is this: - A new extension is added to the X server, with a PixmapFromBufferObject request. - Clients render into a private back buffer object, for which they used the new extension to generate a pixmap. - When a client wishes to copy something to the frontbuffer (for whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses plain old XCopyArea() with the generated pixmap. The X server is then responsible for any clipping necessary. This scheme puts all clip list management in the X server. No cliprects in shared memory or in the kernel would be required. And no locking is required since the X server is already processing requests in sequence. Yes, that is the idea I want to do for nvidia hardware. Although I'm not sure if we can/want to implement it in term of X primitives or a new X extension. To synchronize with vblank, a new SYNC counter is introduced that records the number of vblanks since some time in the past. The clients can then issue SyncAwait requests before any copy they want synchronized with vblank. This allows the client to do useful processing while it waits, which I don't believe is the case now. Since we can put a wait until vblank on crtc #X command to a fifo on nvidia hardware, the vblank issue is non-existent for us. We get precise vblank without CPU intervention. You still have some issues... The choice is: do you put the wait-until-vblank command in the same fifo as the X server rendering or not? If yes -- you end up with nasty latency for X as its rendering is blocked by swapbuffers. Yes, I want to go for that simpler approach first and see if the blocking gets bad (I can't really say until I've tried). I'm all for experiments such as this. Although I have a strong belief how it will turn out, nothing is better at changing these sorts of beliefs than actual results. Keith - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Swapbuffers [was: Re: DRI2 and lock-less operation]
Stephane Marchesin wrote: On 11/28/07, *Keith Whitwell* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: In my ideal world, the entity which knows and cares about cliprects should be the one that does the swapbuffers, or at least is in control of the process. That entity is the X server. Instead of tying ourselves into knots trying to figure out how to get some other entity a sufficiently up-to-date set of cliprects to make this work (which is what was wrong with DRI 1.0), maybe we should try and figure out how to get the X server to efficiently orchestrate swapbuffers. In particular it seems like we have: 1) The X server knows about cliprects. 2) The kernel knows about IRQ reception. 3) The kernel knows how to submit rendering commands to hardware. 4) Userspace is where we want to craft rendering commands. Given the above, what do we think about swapbuffers: a) Swapbuffers is a rendering command b) which depends on cliprect information c) that needs to be fired as soon as possible after an IRQ receipt. So: swapbuffers should be crafted from userspace (a, 4) ... by the X server (b, 1) ... and should be actually fired by the kernel (c, 2, 3) Well, on nvidia hw, you don't even need to fire from the kernel (thanks to a special fifo command that waits for vsync). So I'd love it if going through the kernel for swapbuffers was abstracted by the interface. As I suggested elsewhere, I think that you're probably going to need this even on nvidia hardware. I propose something like: 0) 3D client submits rendering to the kernel and receives back a fence. 1) 3D client wants to do swapbuffers. It sends a message to the X server asking it please do me a swapbuffers after this fence has completed. 2) X server crafts (somehow) commands implementing swapbuffers for this drawable under the current set of cliprects and passes it to the kernel along with the fence. 3) The kernel keeps that batchbuffer to the side until a) the commands associated with the fence have been submitted to hardware. b) the next vblank IRQ arrives. when both of these are true, the kernel simply submits the prepared swapbuffer commands through the lowest latency path to hardware. But what happens if the cliprects change? The 100% perfect solution looks like: The X server knows all about cliprect changes, and can use fences or other mechanisms to keep track of which swapbuffers are outstanding. At the time of a cliprect change, it must create new swapbuffer commandsets for all pending swapbuffers and re-submit those to the kernel. These new sets of commands must be tied to the progress of the X server's own rendering command stream so that the kernel fires the appropriate one to land the swapbuffers to the correct destination as the X server's own rendering flies by. Yes that was the basis for my thinking as well. By inserting the swapbuffers into the normal flow of X commands, we remove the need for syncing with the X server at swapbuffer time. The very simplest way would be just to have the X server queue it up like normal blits and not even involve the kernel. I'm not proposing this. I believe such an approach will fail for the sync-to-vblank case due to latency issues - even (I suspect) with hardware-wait-for-vblank. Rather, I'm describing a mechanism that allows a pre-prepared swapbuffer command to be injected into the X command stream (one way or another) with the guarantee that it will encode the correct cliprects, but which will avoid stalling the command queue in the meantime. In the simplest case, where the kernel puts commands onto the one true ring as it receives them, the kernel can simply discard the old swapbuffer command. Indeed this is true also if the kernel has a ring-per-context and uses one of those rings to serialize the X server rendering and swapbuffers commands. Come on, admit that's a hack to get 100'000 fps in glxgears :) I'm not talking about discarding the whole swap operation, just the version of the swap command buffer that pertained to the old cliprects. Every swap is still being performed. You do raise a good point though -- we currently throttle the 3d driver based on swapbuffer fences. There would need to be some equivalent mechanism to achieve this. Note that condition 3a) above is always true in the current i915.o one-true-ring/single-fifo approach to hardware serialization. Yes, I think those details of how to wait should be left driver-dependent and abstracted in user space. So that we have the choice of calling the kernel, doing it from user space only, relying on a single fifo
Re: DRI2 and lock-less operation
In general the problem with the superioctl returning 'fail' is that the client has to then go back in time and figure out what the state preamble would have been at the start of the batchbuffer. Of course the easiest way to do this is to actually precompute the preamble at batchbuffer start time and store it in case the superioctl fails -- in which case, why not pass it to the kernel along with the rest of the batchbuffer and have the kernel decide whether or not to play it? Keith - Original Message From: Kristian Høgsberg [EMAIL PROTECTED] To: Keith Packard [EMAIL PROTECTED] Cc: Jerome Glisse [EMAIL PROTECTED]; Dave Airlie [EMAIL PROTECTED]; dri-devel@lists.sourceforge.net; Keith Whitwell [EMAIL PROTECTED] Sent: Tuesday, November 27, 2007 8:44:48 PM Subject: Re: DRI2 and lock-less operation On Nov 27, 2007 2:30 PM, Keith Packard [EMAIL PROTECTED] wrote: ... I both cases, the kernel will need to know how to activate a given context and the context handle should be part of the super ioctl arguments. We needn't expose the contexts to user-space, just provide a virtual consistent device and manage contexts in the kernel. We could add the ability to manage contexts from user space for cases where that makes sense (like, perhaps, in the X server where a context per client may be useful). Oh, right we don't need one per GLContext, just per DRI client, mesa handles switching between GL contexts. What about multithreaded rendering sharing the same drm fd? I imagine one optimiation you could do with a fixed number of contexts is to assume that loosing the context will be very rare, and just fail the super-ioctl when it happens, and then expect user space to resubmit with state emission preamble. In fact it may work well for single context hardware... I recall having the same discussion in the past; have the superioctl fail so that the client needn't constantly compute the full state restore on every submission may be a performance win for some hardware. All that this requires is a flag to the kernel that says 'this submission reinitializes the hardware', and an error return that says 'lost context'. Exactly. But the super-ioctl is chipset specific and we can decide on the details there on a chipset to chipset basis. If you have input to how the super-ioctl for intel should look like to support lockless operation for current and future intel chipset, I'd love to hear it. And of course we can version our way out of trouble as a last resort. We should do the lockless and context stuff at the same time; doing re-submit would be a bunch of work in the driver that would be wasted. Is it that bad? We will still need the resubmit code for older chipsets that dont have the hardware context support. The drivers already have the code to emit state in case of context loss, that code just needs to be repurposed to generate a batch buffer to prepend to the rendering code. And the rendering code doesn't need to change when resubmitting. Will that work? Right now, we're just trying to get 965 running with ttm; once that's limping along, figuring out how to make it reasonable will be the next task, and hardware context support is clearly a big part of that. Yeah - I'm trying to limit the scope of DRI2 so that we can have redirected direct rendering and GLX1.4 in the tree sooner rather than later (before the end of the year). Maybe the best way to do that is to keep the lock around for now and punt on the lock-less super-ioctl if that really blocks on hardware context support. How far back is hardware contexts supported? Kristian - SF.Net email is sponsored by: The Future of Linux Business White Paper from Novell. From the desktop to the data center, Linux is going mainstream. Let it simplify your IT future. http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Kristian Høgsberg wrote: On Nov 22, 2007 4:23 AM, Keith Whitwell [EMAIL PROTECTED] wrote: ... My guess for one way is to store a buffer object with the current state emission in it, and submit it with the superioctl maybe, and if we have lost context emit it before the batchbuffer.. The way drivers actually work at the moment is to emit a full state as a preamble to each batchbuffer. Depending on the hardware, this can be pretty low overhead, and it seems that the trend in hardware is to make this operation cheaper and cheaper. This works fine without the lock. There is another complimentary trend to support one way or another multiple hardware contexts (obviously nvidia have done this for years), meaning that effectively the hardware (effectively) does the context switches. This is probably how most cards will end up working in the future, if not already. Neither of these need a lock for detecting context switches. Sure enough, but the problem is that without the lock userspace can't say oops, I lost the context, let me prepend this state emission preamble to the batchbuffer. in a race free way. If we want conditional state emission, we need to make that decision in the kernel. The cases I describe above don't try to do this, but if you really wanted to, the way to do it would be to have userspace always emit the preamble but pass two offsets to the kernel, one at the start of the preamble, the other after it. Then the kernel can choose. I don't think there's a great deal to be gained from this optimization, though. For example, the super ioctl could send the state emission code as a separate buffer and also include the expected context handle. This lets the kernel compare the context handle supplied in the super ioctl with the most recently active context handle, and if they differ, the kernel queues the state emission buffer first and then the rendering buffer. If the context handles match, the kernel just queues the rendering batch buffer. However, this means that user space must prepare the state emission code for each submission, whether or not it will actually be used. I'm not sure if this is too much overhead or if it's negligible? I think both preparing it on CPU and executing it on GPU are likely to be pretty negligible, but some experimentation on a system with just a single app running should show this quickly one way or another. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 and lock-less operation
Dave Airlie wrote: I'm trying to figure out how context switches acutally work... the DRI lock is overloaded as context switcher, and there is code in the kernel to call out to a chipset specific context switch routine when the DRI lock is taken... but only ffb uses it... So I'm guessing the way context switches work today is that the DRI driver grabs the lock and after potentially updating the cliprects through X protocol, it emits all the state it depends on to the cards. Is the state emission done by just writing out a bunch of registers? Is this how the X server works too, except XAA/EXA acceleration doesn't depend on a lot of state and thus the DDX driver can emit everything for each operation? So yes userspaces notices context has changed and just re-emits everything into the batchbuffer it is going to send, for XAA/EXA stuff in Intel at least there is an invarient state emission functions that notices what the context was and what the last server 3D users was (EXA or Xv texturing) and just dumps the state into the batchbuffer.. (or currently into the ring) How would this work if we didn't have a lock? You can't emit the state and then start rendering without a lock to keep the state in place... If the kernel doesn't restore any state, whats the point of the drm_context_t we pass to the kernel in drmLock? Should the kernel know how to restore state (this ties in to the email from jglisse on state tracking in drm and all the gallium jazz, I guess)? How do we identify state to the kernel, and how do we pass it in in the super-ioctl? Can we add a list of registers to be written and the values? I talked to Dave about it and we agreed that adding a drm_context_t to the super-ioctl would work, but now I'm thinking if the kernel doesn't track any state it wont really work. Maybe cross-client state sharing isn't useful for performance as Keith and Roland argues, but if the kernel doesn't restore state when it sees a super-ioctl coming from a different context, who does? My guess for one way is to store a buffer object with the current state emission in it, and submit it with the superioctl maybe, and if we have lost context emit it before the batchbuffer.. The way drivers actually work at the moment is to emit a full state as a preamble to each batchbuffer. Depending on the hardware, this can be pretty low overhead, and it seems that the trend in hardware is to make this operation cheaper and cheaper. This works fine without the lock. There is another complimentary trend to support one way or another multiple hardware contexts (obviously nvidia have done this for years), meaning that effectively the hardware (effectively) does the context switches. This is probably how most cards will end up working in the future, if not already. Neither of these need a lock for detecting context switches. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: mapped cached memory and pre-fetching.
Thomas Hellström wrote: Dave Airlie wrote: On 10/31/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: On 10/31/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave, When starting out with TTM i did look a little at AGP caching issues and there was an issue with cached memory and speculative pre-fetching that may affect the mapped-cached memory, and that we need to know about but perhaps ignore. Suppose you bind a page to the AGP aperture, but don't change the kernel linear map caching policy. Then a speculatively prefetching processor may read the memory into its cache and then decide it doesn't want to use it, and actually write it back. Meanwhile the GPU may have changed the contents of the page and that change will be overwritten. Apparently there were big problems with AMD Athlons actually doing this. Linux people claiming it was an Athlon bug and AMD people claiming it was within specs. http://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/061/6148/6148s1.html Is what I believe you are talking about, I'll add something to the comment mentioning this.. Yup. In the end I believe the change_page_attr(), global_flush_tlb() sequence was the final outcome of this, but as I understand it, with your new code we never write through the GTT, which makes the only possible problem overwrites of GPU written data. Well so far we've only dealt with Intel CPU/GPU combinations which hopefully don't suffer from this issue.. I'll put a comment in but tbh there are lots of ways to mess up things with the current APIs.. Try allocating a snooped batchbuffer, or a snooped private back buffer or anything involved in a blit I'm going to add checks for some of the more stupid things in the Intel superioctl code... Well i915 (and friends') snooped memory is, as you say not very useful, but it works fine AFAICT for things like the flip move from VRAM (which has been disabled ATM due to HW lock issues). Should also be fine for readpixels and zero copy texturing, although I doubt that there is any performance gain in the latter. FWIW, zero copy texturing is a good win when the texture is used only once, eg for video. The streamingrect application (when it was working) showed a very good improvement with the cow-pbo hack (also when it was working). To make it work transparently is quite difficult fragile though. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: intel hw and caching interface to TTM..
Dave Airlie wrote: Dave, I'd like to see the flag DRM_BO_FLAG_CACHED really mean cache-coherent memory, that is cache coherent also while visible to the GPU. There are HW implementations out there (Poulsbo at least) where this option actually seems to work, althought it's considerably slower for things like texturing. It's also a requirement for user bo's since they will have VMAs that we cant kill and remap. Most PCIE cards will be cache coherent, however AGP cards not so much, so need to think if a generic _CACHED makes sense especially for something like radeon, will I have to pass different flags depending on the GART type this seems like uggh.. so maybe a separate flag makes more sense.. Could we perhaps change the flag DRM_BO_FLAG_READ_CACHED to mean DRM_BO_FLAG_MAPPED_CACHED to implement the behaviour you describe. This will also indicate that the buffer cannot be used for user-space sub-allocators, as we in that case must be able to guarantee that the CPU can access parts of the buffer while other parts are validated for the GPU. Yes, to be honest sub-allocators for most use-cases should be avoided if possible, we should be able to make the kernel interface fast enough for most things if we don't have to switching caching flags on the fly at map/destroy etc.. Hmm - if that was true why do we have malloc() and friends - aren't they just sub-allocators for brk() and mmap()? There is more to this than performance - applications out there can allocate extraordinarily large numbers of small textures, that can only sanely be dealt with as light-weight userspace suballocations of a sensible-sized buffer. (We don't do this yet, but will need to at some point!). The reasons for this are granularity (ie wasted space in the allocation), the memory overhead of managing all these allocations, and perhaps third performance. If you think about what goes on in a 3d driver, you are always doing sub-allocations of some sort or another, though that's more obvious when you start doing state objects that have an independent lifecycle as opposed to just emitting state linearly into a command buffer. For managing objects of a few dozen bytes, obviously you are going to want to do that in userspace. So there is a continuum where successively larger buffers increasingly justify whatever additional cost there is to go directly to the kernel to allocate them. But for sufficiently small or frequently allocated buffers, there will always be a crossover point where it is faster to do it in userspace. It certainly makes sense to speed up the kernel paths, but that won't make the crossover point go away - it'll just shift it more or less depending on how successful you are. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] post superioctl inteface removal.
Thomas Hellström wrote: Hi, Eric. Eric Anholt wrote: ... Can you clarify the operation being done where you move scanout buffers before unpinning them? That seems contradictory to me -- how are you scanning out while the object is being moved, and how are you considering it pinned during that time? Actually it's very similar to Dave's issue, and the buffers aren't pinned when they are thrown out. What we'd want to do is the following: 1) Turn of Crtcs. 2) Unpin the buffer 3) Destroy the buffer, leaving it's memory area free. 4) Create and pin new buffer (skipping the copy) 5) Turn on Crtcs. However, with DRI clients operating, 3) will fail. As they may maintain a reference on the front buffer, the old buffer won't immediately get destroyed and it's aperture / VRAM memory area isn't freed up, unless it gets evicted by a new allocation. Is there really a long-term need for DRI clients to maintain a reference to the front buffer? We're moving away from this in lot of ways and if it is a benefit to the TTM work, we could look at severing that tie sooner rather than later... This will in many cases lead to fragmentation where it is really possible to avoid it. The best thing we can do at 3) is to move it out, and then unreference it. When the DRI client recognizes through the SAREA that there's a new front buffer, it will immediately release its reference on the old one, but basically, the old front buffer may be hanging around for quite some time (paused dri clients...) and we don't want it to be present in the aperture, even if it's evictable. This won't stop fragmentation in all cases, but will certainly reduce it. At very least, current DRI/ttm clients could be modified to only use the frontbuffer reference in locked regions, and to have some way of getting the correct handle for the current frontbuffer at that point. Longer term, it's easy to imagine DRI clients not touching the front buffer independently and not requiring a reference to that buffer... Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Merging DRI interface changes
Michel Dänzer wrote: On Fri, 2007-10-12 at 10:19 +0100, Keith Whitwell wrote: Michel Dänzer wrote: On Thu, 2007-10-11 at 18:44 -0400, Kristian Høgsberg wrote: On 10/11/07, Keith Whitwell [EMAIL PROTECTED] wrote: 3) Share buffers with a reference counting scheme. When a client notices the buffer needs a resize, do the resize and adjust refcounts - other clients continue with the older buffer. What happens when a client on the older buffer calls swapbuffers -- I'm sure we can figure out what the correct behaviour should be. 3) Sounds like the best solution and it's basically what I'm proposing. I agree, it looks like this can provide the benefits of shared drawable-private renderbuffers (support for cooperative rendering schemes, no waste of renderbuffer resources) without compromising the general benefits of private renderbuffers. Yes, I'm just interested to understand what happens when one of the clients on the old, orphaned buffer calls swapbuffers... All the buffers should be swapped, right? Large and small? How does that work? If the answer is that we just do the swap on the largest buffer, then you have to wonder what the point of keeping the smaller ones around is? To make 3D drivers nice and simple by not having to deal with fun stuff like cliprects? :) Understood. I'm thinking about a further simplification - rather than keep the old buffers around after the first client requests a resize, just free them. If/when other clients submit commands targeting the old-sized buffers, throw those commands away. Seriously though, as I understand Kristian's planned scheme, all buffer swaps will be done by the DRM, and I presume it'll only take the currently registered back renderbuffer into account, so the contents of any previous back renderbuffers will be lost. I think that's fine, and should address your concerns? See above -- if the contents of the previous back renderbuffers are going to be lost, what is the point in keeping those buffers around? Or doing any further rendering into them? Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Merging DRI interface changes
Brian Paul wrote: Kristian Høgsberg wrote: Hi, I have this branch with DRI interface changes that I've been threatening to merge on several occasions: http://cgit.freedesktop.org/~krh/mesa/log/?h=dri2 I've just rebased to todays mesa and it's ready to merge. Ian reviewed the changes a while back gave his ok, and from what we discussed at XDS2007, I believe the changes there are compatible with the Gallium plans. What's been keeping me from merging this is that it breaks the DRI interface. I wanted to make sure that the new interface will work for redirected direct rendering and GLXPixmaps and GLXPbuffers, which I now know that it does. The branch above doesn't included these changes yet, it still uses the sarea and the old shared, static back buffer setup. This is all isolated to the createNewScreen entry point, though, and my plan is to introduce a new createNewScreen entry point that enables all the TTM features. This new entry point can co-exist with the old entry point, and a driver should be able to support one or the other and probably also both at the same time. The AIGLX and libGL loaders will look for the new entry point when initializing the driver, if they have a new enough DRI/DRM available. If the loader has an old style DRI/DRM available, it will look for the old entry point. I'll wait a day or so to let people chime in, but if I don't hear any stop the press type of comments, I'll merge it tomorrow. This is basically what's decsribed in the DRI2 wiki at http://wiki.x.org/wiki/DRI2, right? The first thing that grabs my attention is the fact that front color buffers are allocated by the X server but back/depth/stencil/etc buffers are allocated by the app/DRI client. If two GLX clients render to the same double-buffered GLX window, each is going to have a different/private back color buffer, right? That doesn't really obey the GLX spec. The renderbuffers which compose a GLX drawable should be accessible/shared by any number of separate GLX clients (like an X window is shared by multiple X clients). I guess I want to know what this really means in practice. Suppose 2 clients render to the same backbuffer in a race starting at time=0, doing something straightforward like (clear, draw, swapbuffers) there's nothing in the spec that says to me that they actually have to have been rendering to the same surface in memory, because the serialization could just be (clear-a, draw-a, swap-a, clear-b, draw-b, swap-b) so that potentially only one client's rendering ends up visible. So I would say that at least between a fullscreen clear and either swap-buffers or some appropriate flush (glXWaitGL ??), we can treat the rendering operations as atomic and have a lot of flexibility in terms of how we schedule actual rendering and whether we actually share a buffer or not.Note that swapbuffers is as good as a clear from this perspective as it can leave the backbuffer in an undefined state. I'm not just splitting hairs for no good reason - the ability for the 3d driver to know the size of the window it is rendering to while it is emitting commands, and to know that it won't change size until it is ready for it to is really crucial to building a solid driver. The trouble with sharing a backbuffer is what to do about the situation where two clients end up with different ideas about what size the buffer should be. So, if it is necessary to share backbuffers, then what I'm saying is that it's also necessary to dig into the real details of the spec and figure out how to avoid having the drivers being forced to change the size of their backbuffer halfway through rendering a frame. I see a few options: 0) The old DRI semantics - buffers change shape whenever they feel like it, drivers are buggy, window resizes cause mis-rendered frames. 1) The current truly private backbuffer semantics - clean drivers but some deviation from GLX specs - maybe less deviation than we actually think. 2) Alternate semantics where the X server allocates the buffers but drivers just throw away frames when they find the buffer has changed shape at the end of rendering. I'm sure this would be nonconformant, at any rate it seems nasty. (i915 swz driver is forced to do this). 3) Share buffers with a reference counting scheme. When a client notices the buffer needs a resize, do the resize and adjust refcounts - other clients continue with the older buffer. What happens when a client on the older buffer calls swapbuffers -- I'm sure we can figure out what the correct behaviour should be. etc. All of these are superficial approaches. My belief is that if we really make an attempt to understand the sharing semantics encoded in the GLX spec, and interpret that in the terms of allowable ordering of rendering operations of separate clients, a favorable implementation is possible. Kristian - I apologize that I
Re: Merging DRI interface changes
Allen Akin wrote: On Thu, Oct 11, 2007 at 10:35:28PM +0100, Keith Whitwell wrote: | Suppose 2 clients render to the same backbuffer... The (rare) cases in which I've seen this used, the clients are aware of one another, and restrict their rendering to non-overlapping portions of the drawable. A master client is responsible for swap and clear. I believe the intent of the spec was to allow CPU-bound apps to make use of multiple processors. Rendering to a single drawable, rather than multiple drawables, allowed swap to be synchronized. I recall discussions about ways to coordinate multiple command streams so that rendering to overlapping areas of the drawable could be handled effectively, but I don't remember any apps that used such methods. Allen, Just to clarify, would things look a bit like this: Master: clear, glFlush, signal slaves somehow Slave0..n: wait for signal, don't clear, just draw triangles glFlush signal master Master: wait for all slaves glXSwapBuffers This is fairly sensible and clearly requires a shared buffer. It's also quite a controlled situation that sidesteps some of the questions about what happens when two clients are issuing swapbuffers willy-nilly on the same drawable at the same time as the user is frantically resizing it... Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Merging DRI interface changes
Kristian Høgsberg wrote: On 10/11/07, Keith Whitwell [EMAIL PROTECTED] wrote: Brian Paul wrote: ... If two GLX clients render to the same double-buffered GLX window, each is going to have a different/private back color buffer, right? That doesn't really obey the GLX spec. The renderbuffers which compose a GLX drawable should be accessible/shared by any number of separate GLX clients (like an X window is shared by multiple X clients). I guess I want to know what this really means in practice. Suppose 2 clients render to the same backbuffer in a race starting at time=0, doing something straightforward like (clear, draw, swapbuffers) there's nothing in the spec that says to me that they actually have to have been rendering to the same surface in memory, because the serialization could just be (clear-a, draw-a, swap-a, clear-b, draw-b, swap-b) so that potentially only one client's rendering ends up visible. I've read the GLX specification a number of times to try to figure this out. It is very vague, but the only way I can make sense of multiple clients rendering to the same drawable is if they coordinate between them somehow. Maybe the scenegraph is split between several processes: one client draws the backdrop, then passes a token to another process which then draws the player characters, and then a third draws a head up display, calls glXSwapBuffers() and passes the token back to the first process. Or maybe they render in parallel, but to different areas of the drawable, synchronize when they're all done and then one does glXSwapBuffers() and they start over on the next frame. ... So, if it is necessary to share backbuffers, then what I'm saying is that it's also necessary to dig into the real details of the spec and figure out how to avoid having the drivers being forced to change the size of their backbuffer halfway through rendering a frame. This is a bigger issue to figure out than the shared buffer one. I know you're looking to reduce the number of changing factors during rendering (clip rects, buffer sizes and locations), but the driver needs to be able to pick up new buffers in a few more places than just swap buffers. But I think we agree that we can add that polling in a couple of places in the driver (before starting a new batch buffer, on flush, and maybe other places) and it should work. Yes, there are a few places, but they are very few. Basically I think it is possible to cut a rendering stream up into chunks which are effectively atomic. Drivers do this all the time anyway - just by building a dma buffer that is then submitted atomically to hardware for processing. It isn't too hard to figure out where the boundaries of these regions are - if we think about a driver with effectively infinite dma space, then such a driver only flushes when required to satisfy requirements placed on it by the spec. I also believe that the only sane time to check the size of the destination drawable is when the driver is *entering* such an atomic region (let's call it a scene). Swapbuffers terminates a scene, it doesn't really start the next one - that doesn't happen until actual rendering starts. I would even say that fullscreen clears don't start a scene, but that's another story... The things that terminate a scene are: - swapbuffers - readpixels and similar - maybe glFlush() - though I'm sometimes naughty and no-op it for backbuffer rendering. Basically any API-generated event that implies a flush. Internally generated events, like running out of some resource and having to fire buffers to recover generally don't count. I see a few options: 0) The old DRI semantics - buffers change shape whenever they feel like it, drivers are buggy, window resizes cause mis-rendered frames. 1) The current truly private backbuffer semantics - clean drivers but some deviation from GLX specs - maybe less deviation than we actually think. 2) Alternate semantics where the X server allocates the buffers but drivers just throw away frames when they find the buffer has changed shape at the end of rendering. I'm sure this would be nonconformant, at any rate it seems nasty. (i915 swz driver is forced to do this). 3) Share buffers with a reference counting scheme. When a client notices the buffer needs a resize, do the resize and adjust refcounts - other clients continue with the older buffer. What happens when a client on the older buffer calls swapbuffers -- I'm sure we can figure out what the correct behaviour should be. 3) Sounds like the best solution and it's basically what I'm proposing. For the first implementation (pre-gallium), I'm looking to just reuse the existing getDrawableInfo polling for detecting whether new buffers are available. It won't be more or less broken than the current SAREA scheme. When gallium starts to land, we can fine-tune the polling
Re: [patch] post superioctl inteface removal.
Dave Airlie wrote: Dave, As mentioned previously to Eric, I think we should keep the single buffer validate interface with the exception that the hint DRM_BO_HINT_DONT_FENCE is implied, and use that instead of the set pin interface. We can perhaps rename it to drmBOSetStatus or something more suitable. This will get rid of the user-space unfenced list access (which I believe was the main motivation behind the set pin interface?) while keeping the currently heavily used (at least in Poulsbo) functionality to move out NO_EVICT scanout buffers to local memory before unpinning them, (to avoid VRAM and TT fragmentation, as DRI clients may still reference those buffers, so they won't get destroyed before a new one is allocated). It would also allow us to specify where we want to pin buffers. If we remove the memory flag specification from drmBOCreate there's no other way to do that, except running the buffer through a superioctl which isn't very nice. Also it would make it much easier to unbreak i915 zone rendering and derived work. If we can agree on this, I'll come up with a patch. I'm quite happy to have this type of interface I can definitely see its uses.. we also need to investigate some sort of temporary NO_MOVE like interface (the NO_MOVE until next fencing...) in order to avoid relocations, but it might be possible to make this driver specific.. Keith P also had an idea for relocation avoidance in the simple case which I've allowed for in my interface, we could use the 4th uint32_t in the relocation to pass in the value we've already written and only relocate it if the buffers location changes, so after doing one superioctl, the validated offsets would be passed back to userspace and used by it and we only have to relocate future buffers if the buffers move.. Theoretically the kernel could keep the relocation lists for each buffer hanging around after use and do this automatically if a buffer is reused, and the buffers that its relocations point to have been moved. That would be a good approach for one sort of buffer reuse, ie persistent state object buffers that are reused over multiple frames but contain references to other buffers. Note that it only makes sense to reuse relocations in situations where those relocations target a small number of buffers - probably other command buffers or buffers containing state objects which themselves make no further reference to other buffers. Trying to go beyond that, eg to reuse buffers of state objects that contain references to texture images, can lead to a major waste of resources. If you think about a situation with a buffer of 50 texture state objects, each referencing 4 texture images, and you just want to reuse one of those state objects -- you will emit a relocation to that state buffer, which will need to be validated and then should recursively require all 200 texture images to be validated, even if you only needed access to 4 of them... The pre-validate/no-move/whatever thing is a useful optimization, but it only makes sense up to a certain level - a handful of command, indirect state and/or vertex buffers is pretty much it. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Initial 915 superioctl patch.
Neither 42 nor 256 are very good - the number needs to be dynamic. Think about situations where an app has eg. one glyph per texture and is doing font rendering... Or any reasonably complex game might use 256 textures in a frame. Sorry for topposting -- webmail. Keith - Original Message From: Jesse Barnes [EMAIL PROTECTED] To: Dave Airlie [EMAIL PROTECTED] Cc: dri-devel@lists.sourceforge.net Sent: Monday, October 8, 2007 6:04:42 PM Subject: Re: Initial 915 superioctl patch. I don't know if 42 is better than 256... do we have any measurements that would help us pick a good number or that would tell us we need to make it a runtime option? Or maybe just part of the argument structure that's passed in? Jesse - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Initial 915 superioctl patch.
Dave Airlie wrote: On Monday, October 8, 2007 10:13 am Keith Whitwell wrote: Neither 42 nor 256 are very good - the number needs to be dynamic. Think about situations where an app has eg. one glyph per texture and is doing font rendering... Or any reasonably complex game might use 256 textures in a frame. So maybe the buffer count should be part of the execbuffer request object? Or does it have to be a separate settable parameter? I would think the kernel needs to limit this in some way... as otherwise we are trusting a userspace number and allocating memory according to it... So I'll make it dynamic but I'll have to add a kernel limit.. keithw: btws poulsbo uses 256 I think also.. Yes but I suspect we'll need to increase or make it dynamic before we're done. As with most hard limits, you can work around it by flushing prematurely, but there is a cost to that, one way or another. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 Design Wiki Page
Kristian Høgsberg wrote: Hi, I wrote up the DRI2 design on a wiki page here: http://wiki.x.org/wiki/DRI2 It's the result of the discussions we had during my redirected rendering talk and several follow up discussions with Keith Whitwell and Michel Daenzer. Relative to the design I presented, the significant change is that we now track clip rects and buffer attachments in the drm as part of the drm_drawable_t. We always have private back buffers and swap buffers is implemented in the drm. All this taken together (also with the super-ioctl) means that we don't need an SAREA or a drm or drawable lock. There is an issue with the design, though, related to how and when the DRI driver discovers that the front buffer has changed (typically resizing). The problem is described in detail on the page, but in short, the problem is that we were hoping to only check for this at glXSwapBuffers() time, but we need to discover this earlier in many cases. Keith W alluded to a beginning of frame synchronization point in a previous mail, which may work, but I'm not sure of the details there. I added a couple of comments, but I'm not sure about the issues around contexts sharing a drawable/backbuffer and the effects of glXSwapBuffers in that case. Brian may be able to help with this a little. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 Design Wiki Page
Keith Whitwell wrote: Kristian Høgsberg wrote: On 10/4/07, Keith Packard [EMAIL PROTECTED] wrote: On Thu, 2007-10-04 at 01:27 -0400, Kristian Høgsberg wrote: There is an issue with the design, though, related to how and when the DRI driver discovers that the front buffer has changed (typically resizing). Why would the rendering application even need to know the size of the front buffer? The swap should effectively be under the control of the front buffer owner, not the rendering application. Ok, I phrased that wrong: what the DRI driver needs to look out for is when size of the rendering buffers change. For a redirected window, this does involve resizing the front buffer, but that's not the case for a non-redirected window. The important part, though, is that the drawable size changes and before submitting rendering, the DRI driver has to allocate new private backbuffers that are big enough to hold the contents. As far as figuring out how big to make the rendering buffers, that's outside the scope of DRM in my book. The GLX interface can watch for ConfigureNotify events on the associated window and resize the back buffers as appropriate. I guess you're proposing libGL should transparently listen for ConfigureNotify events? I don't see how that can work, there is no guarantee that an OpenGL application handles events. For example, glxgears without an event loop, just rendering. If the rendering extends outside the window bounds and you increase the window size, the next frame should include those parts that were clipped by the window in previous frames. X events aren't reliable for this kind of notification. And regardless, the issue isn't so much how to get the resize notification from the X server to the direct rendering client, but rather that the Gallium design doesn't expect these kinds of interruptions while rendering a frame. So while libGL (or AIGLX) may be able to notice that the window size changed, what I'm missing is a mechanism to ask the DRI driver to reallocate its back buffers. I think basically we just need a tweak to what we're already doing for private backbuffers to cope with the periodic rendering case you've highlighted. So basically checking before the first draw and again before swapbuffers, rather than just before swapbuffers. This doesn't address the question about contexts in potentially different processes sharing a backbuffer, but I'm not 100% convinced its possible, and if it is possible under glx, I'm not 100% convinced that its a sensible thing to support anyway... Basically what I'm saying above is that 1) I haven't had a chance to dig into the shared-context issue, 2) in my experience GL and GLX specs provide a good amount of wiggle room to allow for a variety of implementation strategies, and 3) we should be careful not to jump to an unfavourable interpretation of the spec that ties us into a non-optimal architecture. I don't think we're looking at a particularly unique or unusual strategy - quite a few GL stacks end up with private backbuffers it seems, so these are problems that have all been faced and solved before. Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRI2 Design Wiki Page
Kristian Høgsberg wrote: On 10/4/07, Keith Packard [EMAIL PROTECTED] wrote: On Thu, 2007-10-04 at 01:27 -0400, Kristian Høgsberg wrote: There is an issue with the design, though, related to how and when the DRI driver discovers that the front buffer has changed (typically resizing). Why would the rendering application even need to know the size of the front buffer? The swap should effectively be under the control of the front buffer owner, not the rendering application. Ok, I phrased that wrong: what the DRI driver needs to look out for is when size of the rendering buffers change. For a redirected window, this does involve resizing the front buffer, but that's not the case for a non-redirected window. The important part, though, is that the drawable size changes and before submitting rendering, the DRI driver has to allocate new private backbuffers that are big enough to hold the contents. As far as figuring out how big to make the rendering buffers, that's outside the scope of DRM in my book. The GLX interface can watch for ConfigureNotify events on the associated window and resize the back buffers as appropriate. I guess you're proposing libGL should transparently listen for ConfigureNotify events? I don't see how that can work, there is no guarantee that an OpenGL application handles events. For example, glxgears without an event loop, just rendering. If the rendering extends outside the window bounds and you increase the window size, the next frame should include those parts that were clipped by the window in previous frames. X events aren't reliable for this kind of notification. And regardless, the issue isn't so much how to get the resize notification from the X server to the direct rendering client, but rather that the Gallium design doesn't expect these kinds of interruptions while rendering a frame. So while libGL (or AIGLX) may be able to notice that the window size changed, what I'm missing is a mechanism to ask the DRI driver to reallocate its back buffers. I think basically we just need a tweak to what we're already doing for private backbuffers to cope with the periodic rendering case you've highlighted. So basically checking before the first draw and again before swapbuffers, rather than just before swapbuffers. This doesn't address the question about contexts in potentially different processes sharing a backbuffer, but I'm not 100% convinced its possible, and if it is possible under glx, I'm not 100% convinced that its a sensible thing to support anyway... Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm: Branch 'master'
Alan Hourihane wrote: linux-core/drm_drv.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) New commits: diff-tree 6671ad1917698b6174a1af314b63b3800d75248c (from 03c47f1420bf17a1e0f2b86be500656ae5a4c95b) Author: Alan Hourihane [EMAIL PROTECTED] Date: Wed Sep 26 15:38:54 2007 +0100 don't copy back if an error was returned. diff --git a/linux-core/drm_drv.c b/linux-core/drm_drv.c index cedb6d5..8513a28 100644 --- a/linux-core/drm_drv.c +++ b/linux-core/drm_drv.c @@ -645,7 +645,7 @@ long drm_unlocked_ioctl(struct file *fil retcode = func(dev, kdata, file_priv); } - if (cmd IOC_OUT) { + if ((retcode == 0) cmd IOC_OUT) { Hmm, brackets around the == but not around the ?? Could you humour me and change that to something like: if (retcode == 0 (cmd IOC_OUT)) { Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
XDS: Intel i965 docs
Just FYI, one of the things that came up at xds is that Intel is now making a scrubbed version of the i965 docs available under NDA. Dave Airlie has been working with them at redhat, for instance. Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Vblanks, CRTCs and GLX, oh my!
Jesse Barnes wrote: Both the generic DRM vblank-rework and Intel specific pipe/plane swapping have uncovered some vblank related problems which we discussed at XDS last week. Unfortunately, no matter what we do (including the do nothing option), some applications will break some of the time in the new world order. Basically we have a few vblank related bits of code: 1) DRM_IOCTL_WAIT_VBLANK - core DRM vblank wait ioctl 2) driver interrupt code - increments appropriate vblank counter 3) DRM_I915_VBLANK_SWAP - Intel specific scheduled swap ioctl 4) SAREA private data - used for tracking which gfx plane to swap 5) glX*VideoSyncSGI - GL interfaces for sync'ing to vblank events As it stands, DRM_IOCTL_WAIT_VBLANK is downright broken in the new world of dyanmically controlled outputs and CRTCs (at least for i915 and radeon): a client trying to sync against the second CRTC that doesn't pass _DRM_VBLANK_SECONDARY will only work if one CRTC is enabled, due to the way current interrupt handlers increment the respective vblank counters (i.e. they increment the correct counter if both CRTCs are generating events, but only the primary counter if only one CRTC vblank interrupt is enabled). The Intel specific DRM_I915_VBLANK_SWAP is a really nice interface, and is the only reliable way to get tear free vblank swap on a loaded system. However, what it really cares about is display planes (in the Intel sense), so it uses the _DRM_VBLANK_SECONDARY flag to indicate whether it wants to flip plane A or B. Whether or not to pass the _DRM_VBLANK_SECONDARY flag is determined by DRI code based on the SAREA private data that describes how much of a given client's window is visible on either pipe. This should work fine as of last week's mods and only the DDX and DRM code have to be aware of potential pipe-plane swapping due to hardware limitations. The vblank-rework branch of the DRM tree tries to address (1) and (2) by splitting the logic for handling CRTCs and their associated vblank interrupts into discrete paths, but this defeats the original purpose of the driver interrupt code that tries to fall back to a single counter, which is due to limitations in (5), namely that the glX*VideoSyncSGI APIs can only handle a single pipe. So, what to do? One way of making the glX*VideoSyncSGI interfaces behave more or less as expected would be to make them more like DRM_I915_VBLANK_SWAP internally, i.e. using SAREA values to determine which pipe needs to be sync'd against by passing in the display plane the client is most tied to (this would imply making the Intel specific SAREA plane info more generic), letting the DRM take care of the rest. If the SGI glx extensions aren't matching the hardware capabilities, I think it's appropriate to start talking about new extensions exposing behaviour we can support... It might be worth taking a look over the fence at the wgl world and see if there's anything useful there that might be adapted. Keit - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
ttm --- dealing with (more) limited memory pools
I've got a couple of things that are bothering me if we're looking at finalizing the TTM interface in the near term. Specifically I'm concerned that we don't have a recoverable way to deal with out-of-memory situations. Consider a driver that tries to submit whole frames of q3, which is running on say a 32mb card. There's nothing to stop the driver specifying textures in excess of what is available to the memory manager. If the driver does do this, there's no feedback from the kernel that this is a band idea until after it's done. Also, all the geometry is now gone, so it's too late to restructure the command stream or even fall back to software. Note this is a different situation to using 8 huge textures to draw a single triangle, where nothing can be done to help. In the case above, the scene could be split and rendered on hardware, although at the expense of texture thrashing. We've benefited from the flexibilty with IGPs to avoid this so far, but we do want to cope with VRAM and it seems like we are currently missing some of the necessary mechanisms... So the issues are: - how does the driver know ahead of time it is running out of texture space? - if the answer to the above is it doesn't, then how do we rescue submitted command streams that exceed texture space? - relatedly, on cards with texture-from-vram and texture-from-agp, how does the driver know which pool to use for a particular texture? At worst, I can imagine something like the kernel pushing out to userspace a size for each pool which is guarenteed to be available for validated buffers. Eg, on a 32 mb card, we could say that there is a maximum no-evict space of 8mb, meaning that at all times there is at least 24 mb available for validated buffers. There may be more than that. The userspace driver would be responsible for ensuring that all the buffers it wants to validate to that pool do not exceed 24mb (given some alignment constraints???). When it approaches that limit, it can either switch to other pools or flush the command stream. If some of the 8mb is free, it can be used by the memory manager to avoid evicts. In the worst case, it just means that the userspace driver flushes more often than it strictly has to. It can even try and exceed the 24mb if it wants to, but has to live with the possibility of the memory manager saying 'no'. It still has access to the full amount of memory on the card not taken by no-evict buffers. So summarizing: - Enforce a limit on no-evict buffers. Keep these to a contigous region of the address space (XXX: note this makes pageflipping private backbuffers more complex). - Advertize the size of the remaining space. - Drivers monitor the total size of buffers referenced by relocations, and flush before it reaches the available space in any pool. - Drivers may try to reference more as long as they are prepared for failure. - The memory manager uses any extra space to avoid evicts. This seems like it can be implemented in the time available with minimal kernel changes. It also seems like it will probably work, and pushes most of the responsibility into the userspace driver, and allows it to make decisions as the stream is being built rather than trying to fix it up afterwards... Keith - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
TTM BOF topics
Looks like we've got a slot at XDS to talk about all our adventures with buffer management and plans for the future. It might make the session more productive if we do a little groundwork first... If you've been working with TTM and have things you'd like to talk about, please reply to this email and let's try and knock out the easy stuff ahead of time... Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: DRM enhancements document
Michel Dänzer wrote: On Thu, 2007-08-02 at 17:44 +0200, Jerome Glisse wrote: Btw i think that some GPU can wait on vblank using cmd ie you don't need to ask the card to emit irq you just insert a cmd in stream which stall further cmd execution until vblank happen, this might be good for power consumption. It's generally a bad idea because it prevents the GPU from doing anything else that could be done before vertical blank. This is true on cards with a single command stream - if you had per-context ringbuffers and a hardware scheduler, it might be better. Unfortunately, you still end up forcing the cliprects not to change, unless you also have some hardware mechanism for that, which I think is pretty rare nowdays. Hmm. Maybe you could use the frontbuffer alpha bits as a window id. You'd still need to either lock the window position, or find some way of telling the hardware about window position changes, or... something else... Anyway, it gets complicated... Keith - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Merging DRI changes
Kristian Høgsberg wrote: Hi, I've finished the changes to the DRI interface that I've been talking about for a while (see #5714). Ian had a look at the DRI driver side of things, and ACK'ed those changes. I've done the X server changes now plus a couple of GLX module cleanups, and I think it's all ready to push: http://gitweb.freedesktop.org/?p=users/krh/mesa.git;a=shortlog;h=dri2 and http://gitweb.freedesktop.org/?p=users/krh/xserver.git;a=shortlog;h=dri2 One thing that's still missing is Alan H's changes to how DRI/DDX maps the front buffer. While the changes above break the DRI interface, they only require an X server and a Mesa update. Alans patches change the device private shared between the DDX and DRI driver and thus requires updating every DRI capable DDX driver in a non-compatible way. Kristian, Just letting you know Alan's on holidays this week, back on Monday. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: drm-ttm-cleanup-branch
Dave Airlie wrote: On 5/9/07, Thomas Hellström [EMAIL PROTECTED] wrote: Dave Airlie wrote: I'll try it out as soon as there is time. I've just tested glxgears and a few mesa tests on it and it seems to be working fine We should probably think about pulling this over into the DRM sooner rather than later, there are also some changes to the DDX i830_driver.c compat code to deal with... Yup. I've attached a patch (against the cleanup branch) with things I think may be needed. 1) 64-bit reordering. 64-bit scalars, structs and unions should probably be 64-bit aligned in parent structs. I had to insert padding in two cases, but this probably needs to be double-checked. 2) A magic member in the init ioctl. Checking this allows for verbose and friendly failure of code that uses the old interface. 3) Init major / minor versioning of the memory manager interface in case we need changes in the future. 4) expand_pads are 64-bit following Jesse Barnes recommendations. 5) The info_req carries a fence class for validation for a particular command submission mechanism. 6) The info_rep arg carries tile_strides and tile_info. The argument tile_strides is ((actual_tile_stride) 16) | (desired_tile_stride)) Any reason you don't just separate actual_tile_stride and desired_tile_stride to 2xunsigned int? not sure why merging them really gives us anything... The argument tile_info is driver-specific. (Could be tile width, x-major, y-major etc.) Finally, should we perhaps allow for 64-bit buffer object flags / mask at this point? Possibly, the rest all seems like good ideas, I know we hit the 64-bit alignment on nouveau so good to get it fixed early I haven't done any user-space or kernel coding for this yet. Just want to know what you think. Well I'll be offline for a few weeks so I'll get the odd chance to read mail but no development... If kernel-space relocations are on the agenda, I should flag up one issue that we are currently pretending doesn't really exist, namely that not all relocations apply to the main dma/batch buffer. More precisely, state structures stored for use and reuse outside of the command stream can contain pointers to things like draw surfaces and textures. Having pointers in those structs limits their ability to be reused as intended, but there's not much to do about that. The important thing to note is that they *do* need to fixed up with relocation information at the same time as the batchbuffer. Or in other words, that while the batch buffer may be special, it is not unique - there are a (small) number of buffers that will require the same treatment. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: R200 minor cleanups
Oliver McFadden wrote: My thoughts are, we should unify the really common stuff... but I don't think it's possible to unify r200_tex.c and r300_tex.c. The hardware is different, and the file would end up with an #ifdef on every 3rd line; it doesn't make sense here. Just for really common code it does. I don't know what is going to happen with TTM. Maybe we should hack the r300 driver for TTM (and someone else can do R200 and R128 (radeon) if they like) or maybe we should start a new DRI driver completely from scratch, with TTM and good state handling in mind from the beginning. Then we just take the code we need from R300 and merge it into the new DRI driver. Regarding indenting, I indented the driver with the Linux kernel style because that is what matched most (but not all) of the code. The indenting was a little inconsistent. If you like, fell free indent the R200 or R128 (radeon) code, too. I guess for TTM we'll have to wait and see what happens... Just letting you know I've been doing a bit of thinking about the Mesa driver model lately, and I think there's room to pull a lot of stuff up and out of the drivers. Regarding textures - almost all of the texture handling for *all* of the drivers could effectively be handled by the miptree mechanism in the i915tex driver - basically everything could be made device-independent, but with the driver specifying how to lay mipmaps out within rectangular regions that are then managed in a device-independent way. The driver also provides some utilities like CopyBlit, etc, and helpers for choosing formats. I think there are a bunch of similar pieces of functionality that have built up in the drivers, simply because we aren't providing the right level of support in core mesa. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: R300 cleanup questions
Keith Whitwell wrote: Oliver McFadden wrote: I'd like some input on the VBO stuff in r300. In r300_context.h we have the following. /* KW: Disable this code. Driver should hook into vbo module * directly, see i965 driver for example. */ /* #define RADEON_VTXFMT_A */ #ifdef RADEON_VTXFMT_A #define HW_VBOS #endif So the VTXFMT (radeon_vtxfmt_a.c) code is disabled anyway. This also disables hardware VBOs. I guess this has been done since the new VBO branch was merged. So, the question is, should this dead code be removed? I think all drivers are (or should be) moving to the new VBO code anyway. I've already made a patch for this, but I'm not committing until I get the okay from a few people. Yes, the old code should go. I guess there might be some starting points in there for beginning the vbo work, that's about the only reason to keep it. Hmm, I just took a look through the r300 code, and was surprised to see myself listed as the author of several of the files?? I'm pretty sure I haven't done any work on that driver... I think I'd prefer to have a line that says based on xxx by Keith Whitwell, or even just remove my name from the r300_* files and give credit instead to the people who've really been working on that code... Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: R300 cleanup questions
Oliver McFadden wrote: I'd like some input on the VBO stuff in r300. In r300_context.h we have the following. /* KW: Disable this code. Driver should hook into vbo module * directly, see i965 driver for example. */ /* #define RADEON_VTXFMT_A */ #ifdef RADEON_VTXFMT_A #define HW_VBOS #endif So the VTXFMT (radeon_vtxfmt_a.c) code is disabled anyway. This also disables hardware VBOs. I guess this has been done since the new VBO branch was merged. So, the question is, should this dead code be removed? I think all drivers are (or should be) moving to the new VBO code anyway. I've already made a patch for this, but I'm not committing until I get the okay from a few people. Yes, the old code should go. I guess there might be some starting points in there for beginning the vbo work, that's about the only reason to keep it. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Keith Packard wrote: OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). I do not want to do relocations in user space. I don't see why doing thousands of these requires moving this operation out of the kernel. Agreed. The original conception for this was to have valdiation plus relocations be a single operation, and by implication in the kernel. Although the code as it stands doesn't do this, I think that should still be the approach. The issue with thousands of relocations from my point of view isn't a problem - that's just a matter of getting appropriate data structures in place. Where things get a bit more interesting is with hardware where you are required to submit a whole scene's worth of rendering before the hardware will kick off, and with the expectation that the texture placement will remain unchanged throughout the scene. This is a very easy way to hit any upper limit on texture memory - the agp aperture size in the case of integrated chipsets. That's a special case of a the general problem of what do you do when a client submits any validation list that can't be satisfied. Failing to render isn't really an option, either the client or the memory manager has to either prevent it happening in the first place or have some mechanism for chopping up the dma buffer into segments which are satisfiable... Neither of which I can see an absolutely reliable way to do. I think that any memory manager we can propose will have flaws of some sort - either it is prone to failures that aren't really allowed by the API, is excessively complex or somewhat pessimistic. We've chosen a design that is simple, optimistic, but can potentially say no unexpectedly. It would then be up to the client to somehow pick up the pieces potentially submit a smaller list. So far we just haven't touched on how that might work. The way to get around this is to mandate that hardware supports paged virtual memory... But that seems to be a difficult trick. Keith - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Lockups with Googleearth
Adam K Kirchhoff wrote: Michel Dänzer wrote: On Tue, 2007-03-06 at 04:32 -0500, Adam K Kirchhoff wrote: commit 09e4df2c65c1bca0d04c6ffd076ea7808e61c4ae causes the lockup.. If I'm reading the git log properly this is right after the merge from vbo-0.2. However, commit 47d463e954efcd15d20ab2c96a455aa16ddffdcc also causes the lockup, and this is right before the merge from vbo-0.2. No, that's still on vbo-0.2. The last commit on master before the merge is 325196f548f8e46aa8fcc7b030e81ba939e7f6b7. I really recommend gitk. :) Sorry about that. I turned back to the log after browsing through gitk last night well past after I should have been asleep :-) Anyway, you're suspicion was correct, this problem did not exist prior to the merge of the vbo-0.2 branch, but did start immediately after the merge. Does this need to be narrowed down further on the vbo-0.2 branch? You can try, but that branch was a wholesale replacement of some existing functionality, so you may just end up at the commit where things switched over. It may be better than that, so possibly worth trying. It may make sense to try and narrow things down in the driver to a certain operation or set of operations, ie the commit history may be too coarse and you just have to attack the bug from first principles. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] R300 early Z cleanup
Jerome Glisse wrote: On 2/26/07, Roland Scheidegger [EMAIL PROTECTED] wrote: Christoph Brill wrote: Attached is a mini-patch to add the address of early Z to r300_reg.h and use it. Jerome Glisse helped me with this patch. Thanks. :-) Not really related directly to the patch itself, but it seems to me that the conditions when to enable early-z are a bit wrong. First, I'd think you need to disable early-z when writing to the depth output (or use texkill) in fragment programs. Second, why disable early-z specifically if the depth function is gl_never? Is that some workaround because stencil updates don't work otherwise (in that case should certainly rather check for that) or something similar? Roland Yes we need to disable early z when the fragment program write to z buffer, so far there isn't real support for depth writing in the driver (it's in the todo list). I fail to see why we should disable early z when there is texkill instruction (and stencil disabled), if we disable early z then if the fragment doesn't pass the test in the early pass it won't after texkill so will be killed anyway (and better kill early then at the end). If you don't disable early z, you can end up writing values to the depth buffer for fragments that are later killed. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] R300 early Z cleanup
Jerome Glisse wrote: On 2/26/07, Keith Whitwell [EMAIL PROTECTED] wrote: Jerome Glisse wrote: On 2/26/07, Roland Scheidegger [EMAIL PROTECTED] wrote: Christoph Brill wrote: Attached is a mini-patch to add the address of early Z to r300_reg.h and use it. Jerome Glisse helped me with this patch. Thanks. :-) Not really related directly to the patch itself, but it seems to me that the conditions when to enable early-z are a bit wrong. First, I'd think you need to disable early-z when writing to the depth output (or use texkill) in fragment programs. Second, why disable early-z specifically if the depth function is gl_never? Is that some workaround because stencil updates don't work otherwise (in that case should certainly rather check for that) or something similar? Roland Yes we need to disable early z when the fragment program write to z buffer, so far there isn't real support for depth writing in the driver (it's in the todo list). I fail to see why we should disable early z when there is texkill instruction (and stencil disabled), if we disable early z then if the fragment doesn't pass the test in the early pass it won't after texkill so will be killed anyway (and better kill early then at the end). If you don't disable early z, you can end up writing values to the depth buffer for fragments that are later killed. Keith Doesn't early z only discard fragment that fail z test and doesn't write z value, differing the write after fragment operation ? I guess it depends on the hardware - at least some do both the test and write early. You'd have to test somehow. If it does do the writeback early, you need to also look at disabling when alphatest is enabled. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: mesa: Branch 'master'
Stephane Marchesin wrote: Keith Whitwell wrote: configs/linux-dri-debug | 16 1 files changed, 16 insertions(+) New commits: diff-tree 3bfbe63806cee1c44da2625daf069b719f2a6097 (from 747c9129c0b592941b14c290ff3d8ab22ad66acb) Author: Keith Whitwell [EMAIL PROTECTED] Date: Wed Jan 17 08:44:13 2007 + New debug config for linux-dri Also, isn't it time that we use -O2 by default for linux-dri ? It brings quite a bit of performance increase. Failing that, can we add -O2 somewhere in the nouveau makefile (so that only our code gets built with it) ? I don't have a problem with doing this in configs/linux-dri, providing the '-fno-strict-aliasing' flag is set in there somewhere too. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] VBO broken by changes in mesa
Rune Petersen wrote: Keith Whitwell wrote: I've fixed some typo's in this code - with luck this should be solved now? sorry no... If you can't find in the next day, would you mind disabling it for 6.5.2 release? OK, I've made some progress on the i915 at least, can you retry over there? Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: i915: Xserver crash and restart fails
Tino Keitel wrote: On Fri, Nov 17, 2006 at 22:12:09 +0100, Tino Keitel wrote: Hi folks, I use the TV application MythTV that uses OpenGL to draw its GUI. Since a while I can crash my Xserver very easy just by switching to the workspace that shows the MythTV GUI. A restart of the Xserver fails. I If this helps: I can stop the display manager, suspend to disk using suspend2, resume, and restart X. It seems to work again after this. Which version of the i915 driver are you using? Is it i915tex? If not, please upgrade to i915tex and retry. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] VBO broken by changes in mesa
Rune Petersen wrote: Hi, A patch for making sure VBO's are mapped breaks r300: http://marc.theaimsgroup.com/?l=mesa3d-cvsm=116364446305536w=2 It would appear we just need to add _ae_(un)map_vbos() the right places in radeon_vtxfmt_a.c. Rune, my expectation was that the change wouldn't break drivers, but that doing the _ae_map/unmap externally would reduce the performance impact of the change. I can't debug r300 unfortunately, so if adding the explict map/unmap helps, go ahead and do so, but could you also post me stacktraces of the crash (I assume its a crash?) so I can figure out what the underlying problem might be? Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: Keith Whitwell wrote: Rune Petersen wrote: Keith Whitwell wrote: Roland Scheidegger wrote: Keith Whitwell wrote: I think Rune is rather refering to the fact that you can't change (not with legal means at least) the constant you got with _mesa_add_unnamed_constant. Ah right. I missed that. I think there exist at least 2 solutions for that. The clean way would probably be to add some more INTERNAL_STATE (like i965 driver uses) so you use _mesa_add_state_reference instead, in this case mesa's shader code would need to update program parameter based on the drawable information - I'm not sure if accessing a driver's drawable information there would get messy). The easier solution would probably be to just directly manipulate the ParameterValues entry associated with the constant you added, easy though it might be considered somewhat hackish. Just don't forget you not only have to update the constant within r300UpdateWindow (if the currently bound fp requires it), but also when the active fp is switched to another one (and make sure that a parameter upload is actually triggered if it not already is upon drawable changes). I think the parameter approach is probably the right one. This would require that there be a callback into the driver to get this state, and more importantly, the driver would have to set a bit in ctx-NewState (perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which would affect that internal state atom. Thank you. I've hit a bit of a problem: I was planning to have state flags returned from a callback make_state_flags(). something like: ctx-Driver.GetGenericStateFlags(state); The problem being that the context ctx is not a parameter in make_state_flags(). Is there smart way of solving this? Rune, I don't quite understand what you want to do here. Can you show me the code you'd like to have (ignoring the ctx argument issue)? I would have thought that we could determine the state statically and just rely on the driver to set that state in ctx-NewState when necessary. I am trying to make generic state vars that the drivers can use. the way I read these functions: make_state_flags() - returns the state flags should trigger an update of the state var. _mesa_fetch_state() - fetches the state var. In order to make generic state vars. - I need to get the flags via a callback to the driver from make_state_flags(). I need to fetch the vars via a callback to the driver from _mesa_fetch_state(). make_state_flags() { . case STATE_INTERNAL: { switch (state[1]) { case STATE_NORMAL_SCALE: . break; case STATE_TEXRECT_SCALE: . break; case STATE_GENERIC1: assert(ctx-Driver.FetchGenericState); ctx-Driver.FetchGenericState(ctx, state, value); break; } } } _mesa_fetch_state() { . case STATE_INTERNAL: switch (state[1]) { case STATE_NORMAL_SCALE: return _NEW_MODELVIEW; case STATE_TEXRECT_SCALE: return _NEW_TEXTURE; case STATE_GENERIC1: assert(ctx-Driver.GetGenericStateFlags); return ctx-Driver.GetGenericStateFlags(state); } } I guess what I'm wondering is whether the flags you want to put into the driver as generics are actually things which are universal and should be supported across Mesa and the other drivers - is it just stuff like window position? I think it would be better to create a new STATE_WINDOW_POSITION keyed off something like _NEW_BUFFERS for that. It would still be the driver's responsibility to set _NEW_BUFFERS on window position changes though. Keith - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: Keith Whitwell wrote: Roland Scheidegger wrote: Keith Whitwell wrote: I think Rune is rather refering to the fact that you can't change (not with legal means at least) the constant you got with _mesa_add_unnamed_constant. Ah right. I missed that. I think there exist at least 2 solutions for that. The clean way would probably be to add some more INTERNAL_STATE (like i965 driver uses) so you use _mesa_add_state_reference instead, in this case mesa's shader code would need to update program parameter based on the drawable information - I'm not sure if accessing a driver's drawable information there would get messy). The easier solution would probably be to just directly manipulate the ParameterValues entry associated with the constant you added, easy though it might be considered somewhat hackish. Just don't forget you not only have to update the constant within r300UpdateWindow (if the currently bound fp requires it), but also when the active fp is switched to another one (and make sure that a parameter upload is actually triggered if it not already is upon drawable changes). I think the parameter approach is probably the right one. This would require that there be a callback into the driver to get this state, and more importantly, the driver would have to set a bit in ctx-NewState (perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which would affect that internal state atom. Thank you. I've hit a bit of a problem: I was planning to have state flags returned from a callback make_state_flags(). something like: ctx-Driver.GetGenericStateFlags(state); The problem being that the context ctx is not a parameter in make_state_flags(). Is there smart way of solving this? Rune, I don't quite understand what you want to do here. Can you show me the code you'd like to have (ignoring the ctx argument issue)? I would have thought that we could determine the state statically and just rely on the driver to set that state in ctx-NewState when necessary. Keith - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)
Ryan Richter wrote: On Wed, Oct 18, 2006 at 07:54:41AM +0100, Keith Whitwell wrote: This is all a little confusing as the driver doesn't really use that path in normal operation except for a single command - MI_FLUSH, which is shared between the architectures. In normal operation the hardware does the validation for us for the bulk of the command stream. If there were missing functionality in that ioctl, it would be failing everywhere, not just in this one case. I guess the questions I'd have are - did the driver work before the kernel upgrade? - what path in userspace is seeing you end up in this ioctl? - and like Keith, what commands are you seeing? The final question is interesting not because we want to extend the ioctl to cover those, but because it will give a clue how you ended up there in the first place. Here's a list of all the failing commands I've seen so far: 3a440003 d70003 2d010003 e5b90003 2e730003 8d8c0003 c10003 d90003 be0003 1e3f0003 Ryan, Those don't look like any commands I can recognize. I'm still confused how you got onto this ioctl in the first place - it seems like something pretty fundamental is going wrong somewhere. What would be useful to me is if you can use GDB on your application and get a stacktrace for how you end up in this ioctl in the cases where it is failing? Additionally, if you're comfortable doing this, it would be helpful to see all the arguments that userspace thinks its sending to the ioctl, compared to what the kernel ends up thinking it has to validate. There shouldn't ever be more than two dwords being validated at a time, and they should look more or less exactly like {0x0203, 0}, and be emitted from bmSetFence(). All of your other wierd problems, like the assert failures, etc, make me wonder if there just hasn't been some sort of build problem that can only be resolved by clearing it out and restarting. It wouldn't hurt to just nuke your current Mesa and libdrm builds and start from scratch - you'll probably have to do that to get debug symbols for gdb anyway. Keith - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)
Ryan Richter wrote: On Fri, Oct 20, 2006 at 06:51:01PM +0100, Keith Whitwell wrote: Ryan Richter wrote: On Fri, Oct 20, 2006 at 12:43:44PM +0100, Keith Whitwell wrote: Ryan Richter wrote: On Wed, Oct 18, 2006 at 07:54:41AM +0100, Keith Whitwell wrote: All of your other wierd problems, like the assert failures, etc, make me wonder if there just hasn't been some sort of build problem that can only be resolved by clearing it out and restarting. It wouldn't hurt to just nuke your current Mesa and libdrm builds and start from scratch - you'll probably have to do that to get debug symbols for gdb anyway. I had heard something previously about i965_dri.so maybe getting miscompiled, but I hadn't followed up on it until now. I rebuilt it with an older gcc, and now it's all working great! Sorry for the wild goose chase. Out of interest, can you try again with the original GCC and see if the problem comes back? Which versions of GCC are you using? The two gcc versions are the 4.1 (miscompiles) and 3.4 (OK) from Debian unstable. I had originally compiled it myself with gcc-4.1 because the Debian libgl1-mesa-dri package didn't build i965_dri.so until I submitted a build patch to them to have it built. They released a new package a few days ago with i965_dri.so included, presumably built with the same gcc-4.1, the default cc on Debian unstable. I had exactly the same problems with my own version and theirs. I rebuilt it again today with CC=gcc-3.4 and now everything works great. I saved a copy of the old i965_dri.so, so I can verify in the next few days that replacing it breaks things again. Let me know if you want copies of these files to examine. Sure, email me the 4.1 version offline. I'll also see about installing 4.1 here. Keith - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)
This is all a little confusing as the driver doesn't really use that path in normal operation except for a single command - MI_FLUSH, which is shared between the architectures. In normal operation the hardware does the validation for us for the bulk of the command stream. If there were missing functionality in that ioctl, it would be failing everywhere, not just in this one case. I guess the questions I'd have are - did the driver work before the kernel upgrade? - what path in userspace is seeing you end up in this ioctl? - and like Keith, what commands are you seeing? The final question is interesting not because we want to extend the ioctl to cover those, but because it will give a clue how you ended up there in the first place. Keith Keith Packard wrote: On Tue, 2006-10-17 at 13:40 -0400, Ryan Richter wrote: So do I want something like static int do_validate_cmd(int cmd) { return 1; } in i915_dma.c? that will certainly avoid any checks. Another alternative is to printk the cmd which fails validation so we can see what needs adding here. - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)
Your drm module is out of date. Keith Ryan Richter wrote: I have a new Intel 965G board, and I'm trying to get DRI working. Direct rendering is enabled, but all GL programs crash immediately. The message 'DRM_I830_CMDBUFFER: -22' is printed on the tty, and the kernel says: [drm:i915_cmdbuffer] *ERROR* i915_dispatch_cmdbuffer failed Additionally, glxinfo says (in addition to its normal output): glxinfo: bufmgr_fake.c:1245: bmReleaseBuffers: Assertion `intel-locked' failed. This is with 2.6.19-rc2 (and -rc1). Here's a .config and dmesg: # # Automatically generated make config: don't edit # Linux kernel version: 2.6.19-rc2 # Fri Oct 13 13:42:19 2006 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_ZONE_DMA32=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_DMI=y CONFIG_AUDIT_ARCH=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set # CONFIG_POSIX_MQUEUE is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set # CONFIG_IKCONFIG is not set # CONFIG_CPUSETS is not set CONFIG_RELAY=y CONFIG_INITRAMFS_SOURCE= CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y # CONFIG_SYSCTL_SYSCALL is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y # # Block layer # CONFIG_BLOCK=y CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=anticipatory # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_VSMP is not set # CONFIG_MK8 is not set CONFIG_MPSC=y # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=128 CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_INTERNODE_CACHE_BYTES=128 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_MICROCODE=y CONFIG_MICROCODE_OLD_INTERFACE=y # CONFIG_X86_MSR is not set # CONFIG_X86_CPUID is not set CONFIG_X86_HT=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_SCHED_MC=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_BKL=y # CONFIG_NUMA is not set CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_RESOURCES_64BIT=y CONFIG_NR_CPUS=8 # CONFIG_HOTPLUG_CPU is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_IOMMU=y # CONFIG_CALGARY_IOMMU is not set CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y # CONFIG_X86_MCE_AMD is not set # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x20 CONFIG_SECCOMP=y # CONFIG_CC_STACKPROTECTOR is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 # CONFIG_REORDER is not set CONFIG_K8_NB=y CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y CONFIG_GENERIC_PENDING_IRQ=y # # Power management options # CONFIG_PM=y CONFIG_PM_LEGACY=y # CONFIG_PM_DEBUG is not set # CONFIG_PM_SYSFS_DEPRECATED is not set # # ACPI (Advanced Configuration and Power Interface) Support # CONFIG_ACPI=y # CONFIG_ACPI_AC is not set # CONFIG_ACPI_BATTERY is not set CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y # CONFIG_ACPI_HOTKEY is not set CONFIG_ACPI_FAN=y # CONFIG_ACPI_DOCK is not set CONFIG_ACPI_PROCESSOR=y
Re: Intel 965G: i915_dispatch_cmdbuffer failed (2.6.19-rc2)
Arjan van de Ven wrote: On Sat, 2006-10-14 at 09:55 +0100, Keith Whitwell wrote: Your drm module is out of date. Since the reporter is using the latest brand spanking new kernel, that is highly unlikely unless something else in the software universe is assuming newer-than-brand-spanking-new. Heh. I missed that in the title line. I'll retire quietly... Keith - Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Roland Scheidegger wrote: Keith Whitwell wrote: Now I remember why I can't use radeon-dri.drawable, at least not directly when the shader code is added: When the window size changes the constants have to be updated. Is there a way for the driver to update a constant after construction? This is an age-old dilemma... The i965 driver gets around this by locking the hardware before validating and emitting state and drawing commands and unlocking again afterwards - so the window can't change size in the meantime. Other drivers tend to just deal with the occasional incorrectness. In general this is something we need to get a bit better at. API's like DX9 and GL/ES do away with frontbuffer rendering which gives the drivers a lot more flexibility in terms of dealing with window moves and resizes, allowing them to pick a time to respond to a resize. With private backbuffers we might get the same benefits at least in the common case. I think Rune is rather refering to the fact that you can't change (not with legal means at least) the constant you got with _mesa_add_unnamed_constant. Ah right. I missed that. I think there exist at least 2 solutions for that. The clean way would probably be to add some more INTERNAL_STATE (like i965 driver uses) so you use _mesa_add_state_reference instead, in this case mesa's shader code would need to update program parameter based on the drawable information - I'm not sure if accessing a driver's drawable information there would get messy). The easier solution would probably be to just directly manipulate the ParameterValues entry associated with the constant you added, easy though it might be considered somewhat hackish. Just don't forget you not only have to update the constant within r300UpdateWindow (if the currently bound fp requires it), but also when the active fp is switched to another one (and make sure that a parameter upload is actually triggered if it not already is upon drawable changes). I think the parameter approach is probably the right one. This would require that there be a callback into the driver to get this state, and more importantly, the driver would have to set a bit in ctx-NewState (perhaps _NEW_BUFFERS) to indicate that a statechange has occurred which would affect that internal state atom. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: Keith Whitwell wrote: Rune Petersen wrote: Rune Petersen wrote: Roland Scheidegger wrote: Rune Petersen wrote: I hit a problem constructing this: - In order to do range mapping in the vertex shader (if I so choose) I will need a constant (0.5), but how to add it? I think this might work similar to what is used for position invariant programs, instead of using _mesa_add_state_reference you could try _mesa_add_named_parameter. Otherwise, you could always construct 0.5 in the shader itself, since you always have the constants 0 and 1 available thanks to the powerful swizzling capabilities, though surprsingly it seems somewhat complicated. Either use 2 instructions (ADD 1+1, RCP). Or try EX2/EXP though I'm not sure about performance of these, but I guess the approximated EXP should do (2^-1). This math in this patch appear sound. the doom3-demo issue appear unrelated to fragment.position. This version makes use of existing instructions to calculate result.position. split into 2 parts: - select_vertex_shader changes - The actual fragment.position changes This patch assumes that: - That the temp used to calculate result.position is safe to use (true for std. use). - That fragment.position.x y wont be used (mostly true, except for exotic programs.) In order to fix this, I'll need to know the window size, but how? Surely it's right there in radeon-dri.drawable ? Now I remember why I can't use radeon-dri.drawable, at least not directly when the shader code is added: When the window size changes the constants have to be updated. Is there a way for the driver to update a constant after construction? This is an age-old dilemma... The i965 driver gets around this by locking the hardware before validating and emitting state and drawing commands and unlocking again afterwards - so the window can't change size in the meantime. Other drivers tend to just deal with the occasional incorrectness. In general this is something we need to get a bit better at. API's like DX9 and GL/ES do away with frontbuffer rendering which gives the drivers a lot more flexibility in terms of dealing with window moves and resizes, allowing them to pick a time to respond to a resize. With private backbuffers we might get the same benefits at least in the common case. Keith Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: Rune Petersen wrote: Roland Scheidegger wrote: Rune Petersen wrote: I hit a problem constructing this: - In order to do range mapping in the vertex shader (if I so choose) I will need a constant (0.5), but how to add it? I think this might work similar to what is used for position invariant programs, instead of using _mesa_add_state_reference you could try _mesa_add_named_parameter. Otherwise, you could always construct 0.5 in the shader itself, since you always have the constants 0 and 1 available thanks to the powerful swizzling capabilities, though surprsingly it seems somewhat complicated. Either use 2 instructions (ADD 1+1, RCP). Or try EX2/EXP though I'm not sure about performance of these, but I guess the approximated EXP should do (2^-1). This math in this patch appear sound. the doom3-demo issue appear unrelated to fragment.position. This version makes use of existing instructions to calculate result.position. split into 2 parts: - select_vertex_shader changes - The actual fragment.position changes This patch assumes that: - That the temp used to calculate result.position is safe to use (true for std. use). - That fragment.position.x y wont be used (mostly true, except for exotic programs.) In order to fix this, I'll need to know the window size, but how? Surely it's right there in radeon-dri.drawable ? Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: It turns out I missed something obvious... The parameters are passed correctly, I have just not transformed the vertex.position to the fragment.position I guess that's the viewport transformation, or maybe a perspective divide followed by viewport transformation. But I think there's a bigger problem here -- somehow you're going to have to arrange for that value to be interpolated over the triangle so that each fragment ends up with the correct position. Maybe they are being interpolated already? I guess it then depends on whether the interpolation is perspective correct so that once transformed you really get the right pixel coordinates rather than just a linear interpolation across the triangle. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [r300] partly working fragment.position patch
Rune Petersen wrote: Keith Whitwell wrote: Rune Petersen wrote: It turns out I missed something obvious... The parameters are passed correctly, I have just not transformed the vertex.position to the fragment.position I guess that's the viewport transformation, or maybe a perspective divide followed by viewport transformation. I did do a viewport transformation, but I didn't map the z component from a range of [-1 1] to [0 1]. Perspective divide is also needed, but not in my test app (w=1) ATI appears to do perspective divide in the fragment shader. I hit a problem constructing this: - In order to do range mapping in the vertex shader (if I so choose) I will need a constant (0.5), but how to add it? - If I do perspective divide in the fragment shader, I will need to remap the WPOS from an INPUT to a TEMP. But I think there's a bigger problem here -- somehow you're going to have to arrange for that value to be interpolated over the triangle so that each fragment ends up with the correct position. Maybe they are being interpolated already? I guess it then depends on whether the interpolation is perspective correct so that once transformed you really get the right pixel coordinates rather than just a linear interpolation across the triangle. Is there a way to visually verify this? Yes of course - once you've got it working, emit the position as fragment.color and have a test program read it back. If it is correct on triangles that are 'flat' but incorrect on ones that are angled away from the viewer, then it is wrong. My guess is it'll probably be fine. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: tnl trouble, how can you do hw state changes depending on the primitive being rendered
Ian Romanick wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Roland Scheidegger wrote: Roland Scheidegger wrote: I thought there was a mechanism that allowed the driver to be notified at glBegin (or similar) time. It seems like you ought to be able to emit some extra state at that time to change to / from point-sprite mode. Ah, sounds like a plan. I thought the NotifyBegin would only be useful for vtxfmt-replace like things. I'll look into that. That was too fast. The NotifyBegin will only be called if there is actually new state, otherwise the tnl module will simply keep adding new primitives. I think the core should be modified to call NotifyBegin if there is new state *or* the primitive type changes. Perhaps there should be a flag to request it being called in that case. Basically for the hwtnl case you need to look at where you're emitting the drawing commands and inject the state right at that point. For r200 and ignoring the vtxfmt stuff, that means you need to modify the loop in r200_run_tcl_render to emit the right state at the right time, depending on what primitive is about to be emitted. The i965 driver is quite involved at this level as it has to change all sorts of stuff based on the primitive - the clipping algorithm obviously changes between points, lines and triangles, and so on. Regular swtnl drivers also turn stuff on/off based on primtive, there is quite a bit of mechanism in place for this already - have a look at eg. r128ChooseRenderState and r128RenderPrimitive/r128RasterPrimitive for details. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: tnl trouble, how can you do hw state changes depending on the primitive being rendered
Brian Paul wrote: Ian Romanick wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Roland Scheidegger wrote: Roland Scheidegger wrote: I thought there was a mechanism that allowed the driver to be notified at glBegin (or similar) time. It seems like you ought to be able to emit some extra state at that time to change to / from point-sprite mode. Ah, sounds like a plan. I thought the NotifyBegin would only be useful for vtxfmt-replace like things. I'll look into that. That was too fast. The NotifyBegin will only be called if there is actually new state, otherwise the tnl module will simply keep adding new primitives. I think the core should be modified to call NotifyBegin if there is new state *or* the primitive type changes. Perhaps there should be a flag to request it being called in that case. Brian, do you have an opinion on this? The tnl module is pretty much Keith's domain. One thing to keep in mind is glPolygonMode. Depending on whether a triangle is front or back-facing, it may either be rendered as a filled triangle, lines, or the vertices rendered as GL_POINTS (which may be sprites!). I think cases like that might be a fallback. Anyway, even if glBegin(GL_TRIANGLES) is called, you may wind up rendering lines, points or sprites instead of triangles. Off-hand I don't know how this is currently handled in the DRI drivers. Keith would know. The i965 handles polygonmode in hardware by uploading programs that deal with all the possibilities. It's tempting to say it just works, but the reality is that it is pretty intricately coded. The r200 falls back to software tnl for unfilled triangles, and uses the same mechanisms as swtnl drivers for this. Regular swtnl drivers handle unfilled polygons by using the templates in tnl_dd/ to generate triangle functions which provide all the necessary logic for selecting the right sort of primitive and notifying the driver of transitions between the different primitive types. This is the RenderPrimitive/RasterPrimitive callbacks that exist in most of the swtnl drivers. Keith - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel