Re: BUG: unable to handle kernel NULL pointer dereference, i915_gem_object_move_to_active
(cc dri-devel) On Wed, 5 May 2010 00:22:16 +0200 Nils Radtke l...@think-future.de wrote: Hi, It happens quite often that X crashes for unknown reasons but this time there it's left us a note, this BUG: BUG: unable to handle kernel NULL pointer dereference at 01e4 IP: [c134ddbc] i915_gem_object_move_to_active+0x1c/0xa0 *pde = Oops: [#4] PREEMPT last sysfs file: /sys/devices/pci:00/:00:1c.1/:04:00.0/net/wlan0/wireless/link Modules linked in: wlan_scan_sta ath_rate_sample ath_pci ath_hal option usbserial usb_storage snd_usb_audio snd_usb_lib snd_seq_midi snd_rawmidi uvcvideo wlan_ccmp video1394 raw1394 dv1394 firewire_ohci firewire_core wlan ohci1394 tg3 libphy ieee1394 [last unloaded: ath_hal] Pid: 3328, comm: Xorg Tainted: G D2.6.33 #1 Columbia /Extensa 5220 EIP: 0060:[c134ddbc] EFLAGS: 00213282 CPU: 0 EIP is at i915_gem_object_move_to_active+0x1c/0xa0 EAX: EBX: c6679450 ECX: c679fd50 EDX: 00b10a13 ESI: EDI: EBP: 00b10a13 ESP: f51c1cb0 DS: 007b ES: 007b FS: GS: 0033 SS: 0068 Process Xorg (pid: 3328, ti=f51c task=f6be5bf0 task.ti=f51c) Stack: c6679450 0004 c679fd50 c134e83f 0002 c17d42ab c16e3b29 0 c17d71ee 00b10a13 0004 f7395800 f7318000 f7318e10 c1326450 f7318df8 0 f6baff20 00b10a13 f7318de8 f7318e10 f7318000 0004 0001 f7395800 Call Trace: [c134e83f] ? i915_add_request+0x25f/0x370 [c1326450] ? agp_flush_chipset+0x10/0x20 [c1352d55] ? i915_gem_do_execbuffer+0x1325/0x13a0 [c102e087] ? check_preempt_wakeup+0x87/0x160 [c13530ae] ? i915_gem_execbuffer+0xee/0x4b0 [c13531fb] ? i915_gem_execbuffer+0x23b/0x4b0 [c1334056] ? drm_ioctl+0x186/0x380 [c1352fc0] ? i915_gem_execbuffer+0x0/0x4b0 [c10d79dd] ? do_sync_read+0xad/0xf0 [c1333ed0] ? drm_ioctl+0x0/0x380 [c10e5fcb] ? vfs_ioctl+0x2b/0xa0 [c10e61a3] ? do_vfs_ioctl+0x73/0x5f0 [c1038229] ? do_setitimer+0xd9/0x220 [c1058a93] ? ktime_get_ts+0xe3/0x100 [c104e1e0] ? posix_ktime_get_ts+0x0/0x10 [c10e6786] ? sys_ioctl+0x66/0x70 [c1002c8c] ? sysenter_do_call+0x12/0x22 Code: 8a 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 83 ec 10 89 c1 89 6c 24 0c 89 d5 89 1c 24 89 74 24 04 89 7c 24 08 8b 40 08 8b 71 50 8b b8 e4 01 00 00 8b 46 20 85 c0 74 5f 89 e1 81 e1 00 e0 ff ff EIP: [c134ddbc] i915_gem_object_move_to_active+0x1c/0xa0 SS:ESP 0068:f51c1cb0 CR2: 01e4 ---[ end trace 13931f32db9a26c3 ]--- - Is there any particular meaning for last sysfs file shown? I mean, could there be some link between the crash and last sysfs file? - X crashing happens actually a lot on this specific notebook. If it's not X crashing, then unmotivated kde log-offs or frequent firefox crashes contribute their share. It's most probably not faulty RAM, it's new, it's been tested thoroughly. Thanks for some answers, Nils -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: DRM Error on Acer Aspire One
On Wed, 12 May 2010 08:22:49 +1000 Dave Airlie airl...@gmail.com wrote: On Wed, May 12, 2010 at 5:57 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton a...@linux-foundation.org wrote: On Tue, 11 May 2010 19:52:31 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton a...@linux-foundation.org wrote: No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context: it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(), io_mapping_map_atomic_wc() etc be changed so that the caller passes in the KM_foo kmap slot index. Argh, sorry for the noise, read the mail in the wrong order. Thanks for the review. It would be sensible to go with your simpler patch whilst io_mapping_map_atomic_wc() is improved. OK. __I'll be sending a bunch of fixes Linuswards in an hour or two. Should I include this? Yes. Acked-by: Chris Wilson ch...@chris-wilson.co.uk I'm not sure pushing this in at this point is a good idea, if I'm reading it correctly we've no idea what KM_IRQ is being used for, It's used for taking kmaps from IRQ contexts. and this codepath is called from non-irq contexts just as much as irq contexts. That's fine. As long as we do a local_irq_disable(), KM_IRQ0 can be used from both irq- and non-irq contexts. All we need to do is to ensure that some interrupt cannot come along on this CPU and corrupt the slot. I'd rather we just backout the hangcheck stuff touching copies at all at this point, and try again doing it properly with a slow work or something for later. Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 16148] page allocation failure. order:1, mode:0x50d0
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). (switching back to email, actually) On Sun, 13 Jun 2010 13:01:57 GMT bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=16148 Mikko C. mikko@gmail.com changed: What|Removed |Added CC||mikko@gmail.com --- Comment #8 from Mikko C. mikko@gmail.com 2010-06-13 13:01:53 --- I have been getting this with 2.6.35-rc2 and rc3. Could it be the same problem? X: page allocation failure. order:0, mode:0x4 Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1 Call Trace: [8108ce49] ? __alloc_pages_nodemask+0x629/0x680 [8108c920] ? __alloc_pages_nodemask+0x100/0x680 [a00db8f3] ? ttm_get_pages+0x2c3/0x448 [ttm] [a00d4658] ? __ttm_tt_get_page+0x98/0xc0 [ttm] [a00d4988] ? ttm_tt_populate+0x48/0x90 [ttm] [a00d4a26] ? ttm_tt_bind+0x56/0xa0 [ttm] [a00d5230] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm] [a00d76d6] ? ttm_bo_move_buffer+0x166/0x180 [ttm] [a00b9736] ? drm_mm_kmalloc+0x26/0xc0 [drm] [81030ea9] ? get_parent_ip+0x9/0x20 [a00d7786] ? ttm_bo_validate+0x96/0x130 [ttm] [a00d7b35] ? ttm_bo_init+0x315/0x390 [ttm] [a0122eb8] ? radeon_bo_create+0x118/0x210 [radeon] [a0122fb0] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon] [a013704c] ? radeon_gem_object_create+0x8c/0x110 [radeon] [a013711f] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon] [a00b10e6] ? drm_ioctl+0x3d6/0x470 [drm] [a01370d0] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] [810b965f] ? do_sync_read+0xbf/0x100 [810c8965] ? vfs_ioctl+0x35/0xd0 [810c8b28] ? do_vfs_ioctl+0x88/0x530 [81031ed7] ? sub_preempt_count+0x87/0xb0 [810c9019] ? sys_ioctl+0x49/0x80 [810ba4fe] ? sys_read+0x4e/0x90 [810024ab] ? system_call_fastpath+0x16/0x1b That's different. ttm_get_pages() looks pretty busted to me. It's not using __GFP_WAIT and it's not using __GFP_FS. It's using a plain GFP_DMA32 so it's using atomic allocations even though it doesn't need to. IOW, it's shooting itself in the head. Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT. If so, it should very much do so. If not then the function is misdesigned and should be altered to take a gfp_t argument so the caller can tell ttm_get_pages() which is the strongest allocation mode which it may use. [TTM] Unable to allocate page. radeon :01:05.0: object_init failed for (7827456, 0x0002) [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456, 2, 4096, -12) This bug actually broke stuff for you. Something like this: --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page /* No pool for cached pages */ if (pool == NULL) { if (flags TTM_PAGE_FLAG_DMA32) - gfp_flags |= GFP_DMA32; + gfp_flags |= GFP_KERNEL|GFP_DMA32; else gfp_flags |= GFP_HIGHUSER; _ although I wonder whether it should be using pool-gfp_flags. It's a shame that this code was developed and merged in secret :( Had we known, we could have looked at enhancing mempools to cover the requirement, or at implementing this in some generic fashion rather than hiding it down in drivers/gpu/drm. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Report for 2.6.35-rc3-00262-g984bc96
On Thu, 1 Jul 2010 09:40:52 +0200 Nico Schottelius nico-linux-2010-07...@schottelius.org wrote: Good morning! A short report on what's broken in 2.6.35-rc3-00262-g984bc96 with the Lenovo X201: So you see two post-2.6.34 regressions? == xrandr == After using xrandr several times in xorg, the screen gets fancy: blue/white/black changing patterns. Getting even more weired when changing to a conosle. The only thing that keeps on working is the mouse cursor in xorg. This did not happen with 2.6.34-rcsomething, neither with 2.6.33 iirc. This is a Intel Corporation Core Processor Integrated Graphics Controller (rev 02), attached xorg.log. (cc dri-devel) == netlink/carrier messag == dhcpcd does not get updated when the link is established, need to restart it. (cc netdev) == suspend/resume (to ram) == So far no real crash! Only xorg is sometimes not coming up again, but this may be related to the first issue. == wifi / intel 6000 == Works! Works! (Besides the netlink issue, which is annoying, but for all devices). ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bugme-new] [Bug 16488] New: [i915] Framebuffer ID error after suspend/hibernate leading to X crash
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 1 Aug 2010 08:55:49 GMT bugzilla-dae...@bugzilla.kernel.org wrote: https://bugzilla.kernel.org/show_bug.cgi?id=16488 Innocuous-looking one-liner is said to have made Milan's X server even worse than normal. Summary: [i915] Framebuffer ID error after suspend/hibernate leading to X crash Product: Drivers Version: 2.5 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - Intel) AssignedTo: drivers_video-dri-in...@kernel-bugs.osdl.org ReportedBy: nalimi...@club.fr CC: ch...@chris-wilson.co.uk Regression: Yes I've been experiencing X freezes and crashes for more than a year, and with every kernel version the cause of the bug changes. After Linus pushed 985b823b919273fe1327d56d2196b4f92e5d0fae to 2.6.35rc6 (see below [2]), I'm now getting an invalid framebuffer id error that kills my X server. Before that commit, I was getting an oops, which was reported in bugs.fd.o as [1]. /var/log/kern.log: [ 1467.408347] PM: Finishing wakeup. [ 1467.408350] Restarting tasks ... done. [ 1467.434616] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id [ 1467.747233] sky2 :02:00.0: eth0: enabling interface [...] [ 1512.204160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 1512.205452] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 11072 at 11071) At this point, the X server is killed, and won't restart: Fatal server error: Failed to submit batchbuffer: Input/output error Excerpt from lspci -vnn: 00:02.1 Display controller [0380]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2792] (rev 03) Subsystem: Toshiba America Info Systems Device [1179:ff00] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Region 0: Memory at 6400 (32-bit, non-prefetchable) [disabled] [size=512K] Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 1: https://bugs.freedesktop.org/show_bug.cgi?id=26974 2: commit 985b823b919273fe1327d56d2196b4f92e5d0fae Author: Linus Torvalds torva...@linux-foundation.org Date: Fri Jul 2 10:04:42 2010 +1000 drm/i915: fix hibernation since i915 self-reclaim fixes Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 (drm/i915: Selectively enable self-reclaim), we've been passing GFP_MOVABLE to the i915 page allocator where we weren't before due to some over-eager removal of the page mapping gfp_flags games the code used to play. This caused hibernate on Intel hardware to result in a lot of memory corruptions on resume. See for example http://bugzilla.kernel.org/show_bug.cgi?id=13811 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: regression in 2.6.35.4 'load is to heavy (video subsystem?)'
On Mon, 30 Aug 2010 10:02:36 +0200 Karsten Mehrhoff kaw...@gmx.de wrote: Using the same .config from 2.6.35.3 to compile 2.5.36.4 results in a heavy load with 2.6.35.4. A regression within -stable is rather bad. Example: Difference between 2.6.35.1/2/3 and 2.6.35.4 while watching some videos: 2.6.35.4 switches the cpu for flash videos in the browser (opera or iceweasel) or other video outputs to 2200/2400/2600 MHz meanwhile 2.6.35.3 (or older) stays at 1000 Mhz. That results in a higher cpu temperature, more power consumption and so one. Using other GUI program results in nearly the same problems with 2.6.35.4, so this kernel is unusable for me. Results to see the difference for the same action 2.6.35.4 Core0 Temp: +45.0__C Core1 Temp: +43.0__C cpu MHz: 2200.000 or higher 2.6.35.3 Core0 Temp: +32.0__C Core1 Temp: +31.0__C cpu MHz: 1000.000 (max. 1800, but falling back to 1000) kernel compiled with 'CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y' results for me in 1000, 1800, 2000, 2200, 2400, 2600 MHz. I'm not the only one with this problem, other users experienced the same behavior on other systems on 386 systems, i.e. a regression for glxgears about 30% on slower systems. We all uses differnet AMD cpus and nNida graphic controllers. Same results for the nvidia-kernel from the repos or the nVidia driver from nvidia.com. There must something be wrong in the video subsystem, which is causing this regression. My system (overview using 2.5.35.3): = Processor:2x AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ Memory:4060MB Display Resolution: 1920x1080 pixels OpenGL Renderer: GeForce 9500 GT/PCI/SSE2 X11 Vendor: The X.Org Foundation Version: 1.7.7 Version Kernel: Linux 2.6.35.3-kmt (x86_64) Compiled: SMP Mon Aug 23 00:58:37 CEST 2010 C Library: GNU C Library version 2.11.2 (stable) Default C Compiler: GNU C Compiler version 4.4.5 20100824 (prerelease) (Debian 4.4.4-11) Distribution: Debian GNU/Linux squeeze/sid OpenGL Vendor:NVIDIA Corporation Renderer: GeForce 9500 GT/PCI/SSE2 Version: 3.3.0 NVIDIA 256.44 Direct Rendering: Yes I'm not seeing any relevant cpufreq changes in 2.6.35.3 - 2.6.35.4 (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.35.4). There were a lot of DRM changes. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: idr_remove called for id=0 which is not allocated
On Wed, 22 Sep 2010 23:10:10 +0200 Alessandro Guido a...@alessandroguido.name wrote: I have this traces in my logs (full dmesg attached). idr_remove called for id=0 which is not allocated. Pid: 1136, comm: Xorg Not tainted 2.6.36-rc5-49-gc79bd89 #1 Call Trace: [c1379e16] ? printk+0x18/0x1a [c113b8f3] idr_remove+0x73/0x1c0 [c11b8d6f] drm_mode_object_put+0x2f/0x50 [c11b8f6e] drm_mode_destroy+0xe/0x20 [c11eb24b] nouveau_connector_get_modes+0x2b/0x390 [c1185b6f] ? acpi_lid_open+0x22/0x3c [c103800b] ? queue_delayed_work+0x1b/0x30 [c11abf34] drm_helper_probe_single_connector_modes+0xc4/0x360 [c11bb6a7] drm_mode_getconnector+0x2a7/0x350 [c11b00d2] drm_ioctl+0x1c2/0x4b0 [c1068071] ? filemap_fault+0x81/0x3c0 [c11bb400] ? drm_mode_getconnector+0x0/0x350 [c107bedf] ? handle_mm_fault+0x13f/0x670 [c11aff10] ? drm_ioctl+0x0/0x4b0 [c109c90d] do_vfs_ioctl+0x7d/0x5f0 [c101a14c] ? do_page_fault+0x17c/0x3c0 [c108fc6d] ? vfs_write+0xfd/0x140 [c108f1c0] ? do_sync_write+0x0/0xe0 [c109ceb9] sys_ioctl+0x39/0x60 [c1002b90] sysenter_do_call+0x12/0x26 I assume this is a regression. 2.6.35 didn't do this? ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: DRM-related kmalloc-32 memory leak in 2.6.35
On Wed, 25 Aug 2010 15:59:09 -0500 Matt Mackall m...@selenic.com wrote: On Tue, 2010-08-24 at 10:37 -0500, Christoph Lameter wrote: On Tue, 24 Aug 2010, Matt Mackall wrote: kmalloc-321113344 1113344 32 1281 : tunables00 0 : slabdata 8698 8698 0 That's /proc/slabinfo on my laptop with SLUB. It looks like my last reboot popped me back to 2.6.33 so it may also be old news, but I couldn't spot any reports with Google. Boot with slub_debug as a kernel parameter and then do a cat /sys/kernel/slab/kmalloc-32/alloc_calls to find the caller allocating the objets. Still present in 2.6.35. Appears to be DRM: 845 drm_vm_open_locked+0x72/0x109 age=43/37572/59269 pid=2089 cpus=0-1 That's after about a minute of uptime. Grows to 100k in about a day. dmesg bits: [0.834653] [drm] Initialized drm 1.1.0 20060810 [0.834986] pci :00:02.0: PCI INT A - GSI 16 (level, low) - IRQ 16 [0.834995] pci :00:02.0: setting latency timer to 64 [1.002572] mtrr: type mismatch for e000,1000 old: write-back new: write-combining [1.002580] [drm] MTRR allocation failed. Graphics performance may suffer. [1.019880] acpi device:03: registered as cooling_device2 [1.021520] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input3 [1.021543] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [1.021855] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on minor 0 This is with: 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) Matt tells me that this (drop-dead box-killing!) bug is still present in current kernels. Could someone take a look please? The code seems very simple, and I couldn't spot a problem. Probably this means that I'm even simpler. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: Prune GEM vma entries
That was quick, thanks. On Mon, 27 Sep 2010 21:08:36 +0100 Chris Wilson ch...@chris-wilson.co.uk wrote: Hook the GEM vm open/close ops into the generic drm vm open/close so that the vma entries are created and destroy appropriately. Please update the changelog to indicate that it fixes a memory leak. Reported-by: Matt Mackall m...@selenic.com Cc: Dave Airlie airl...@redhat.com Cc: Jesse Barnes jbar...@virtuousgeek.org Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk And here please add Cc: sta...@kernel.org so that earlier kernels get reliably fixed. Thanks. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH RESEND] drm: include missing types header to drm_mode.h
On Fri, 22 Oct 2010 10:13:19 -0300 Davidlohr Bueso d...@gnu.org wrote: drm: include missing types header to drm_mode.h Signed-off-by: Davidlohr Bueso d...@gnu.org --- include/drm/drm_mode.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/drm/drm_mode.h b/include/drm/drm_mode.h index 0fc7397..eddd7f4 100644 --- a/include/drm/drm_mode.h +++ b/include/drm/drm_mode.h @@ -24,6 +24,8 @@ * IN THE SOFTWARE. */ +#include linux/types.h + #ifndef _DRM_MODE_H #define _DRM_MODE_H Does this fix a build error? If so, please send along the compiler error output. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/5] Backlight: Add backlight type
On Fri, 19 Nov 2010 10:53:52 -0500 Matthew Garrett m...@redhat.com wrote: There may be multiple ways of controlling the backlight on a given machine. Allow drivers to expose the type of interface they are providing, making it possible for userspace to make appropriate policy decisions. ... 60 files changed, 102 insertions(+), 0 deletions(-) This patch has a pretty short half-life. ... --- a/include/linux/backlight.h +++ b/include/linux/backlight.h @@ -32,6 +32,12 @@ enum backlight_update_reason { BACKLIGHT_UPDATE_SYSFS, }; +enum backlight_type { + BACKLIGHT_RAW, + BACKLIGHT_PLATFORM, + BACKLIGHT_FIRMWARE, +}; + struct backlight_device; struct fb_info; @@ -62,6 +68,8 @@ struct backlight_properties { /* FB Blanking active? (values as for power) */ /* Due to be removed, please use (state BL_CORE_FBBLANK) */ int fb_blank; + /* Backlight type */ + enum backlight_type type; /* Flags used to signal drivers of state changes */ /* Upper 4 bits are reserved for driver internal use */ unsigned int state; And if/when the half-life expires, we'll have drivers in-tree which forget to set backlight_properties.type. I haven't checked, but if we're lucky they will default to 0. What will be the runtime effects upon such unconverted drivers? Ideally we'd like them to continue to work OK, and to emit a runtime warning. In which case you'll need BACKLIGHT_RAW=1 so the unconverted driver can be detected, warned about and fixed up by the core code. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/5] Backlight: Add backlight type
On Fri, 19 Nov 2010 20:25:59 + Matthew Garrett mj...@srcf.ucam.org wrote: On Fri, Nov 19, 2010 at 12:05:23PM -0800, Andrew Morton wrote: On Fri, 19 Nov 2010 10:53:52 -0500 Matthew Garrett m...@redhat.com wrote: There may be multiple ways of controlling the backlight on a given machine. Allow drivers to expose the type of interface they are providing, making it possible for userspace to make appropriate policy decisions. ... 60 files changed, 102 insertions(+), 0 deletions(-) This patch has a pretty short half-life. Well, ideally it would have landed in the backlight tree when I sent it months ago. Then we'd have the opportunity to ensure that everything was fixed up before it went in in the merge window. Richard got distracted. At present I'm grabbing the leds and backlight patches and Richard is reviewing them as they fly past. I don't see there's much point in me merging this patch series so if it survives review, I'd suggest that you put it into an mjg tree and thence into linux-next and mainline? @@ -62,6 +68,8 @@ struct backlight_properties { /* FB Blanking active? (values as for power) */ /* Due to be removed, please use (state BL_CORE_FBBLANK) */ int fb_blank; + /* Backlight type */ + enum backlight_type type; /* Flags used to signal drivers of state changes */ /* Upper 4 bits are reserved for driver internal use */ unsigned int state; And if/when the half-life expires, we'll have drivers in-tree which forget to set backlight_properties.type. I haven't checked, but if we're lucky they will default to 0. Depends entirely on whether they kzalloc the structure or not before calling backlight_device_register(). Well. Even if it's uninitialised, the chances of the value being 1, 2 or 3 for all users are pretty small, so we'll still get to hear about it if the runtime check is appropriately implemented. What will be the runtime effects upon such unconverted drivers? Ideally we'd like them to continue to work OK, and to emit a runtime warning. In which case you'll need BACKLIGHT_RAW=1 so the unconverted driver can be detected, warned about and fixed up by the core code. The worst case I can think of is that we walk off the array - I guess there's an argument for sanity checking that in backlight_show_type(). OK, well please have a think about it, see what you can do to handle unconverted (and possibly out-of-tree) drivers in a friendly fashion. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/4] drm: fix headers to include linux/types.h
On Wed, 1 Dec 2010 17:54:18 +0100 Julien Cristau jcris...@debian.org wrote: On Wed, Dec 1, 2010 at 17:10:42 +0200, Alexander Shishkin wrote: For headers that get exported to userland and make use of u32 style type names, it is advised to include linux/types.h. This fixes 5 headers_check warnings. How many times does this need to be NAKed? Until someone gets a clue and puts comments in there explaining this? ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/5] radeon: Expose backlight class device for legacy LVDS encoder
On Fri, 14 Jan 2011 14:24:23 -0500 Matthew Garrett m...@redhat.com wrote: From: Michel D__nzer mic...@daenzer.net Allows e.g. power management daemons to control the backlight level. Inspired by the corresponding code in radeonfb. (Updated to add backlight type and make the connector the parent device - mjg) x86_64 allmodconfig: drivers/gpu/drm/radeon/radeon_legacy_encoders.c: In function 'radeon_legacy_lvds_update': drivers/gpu/drm/radeon/radeon_legacy_encoders.c:64: error: 'struct radeon_encoder_atom_dig' has no member named 'bl_dev' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:65: error: 'struct radeon_encoder_atom_dig' has no member named 'backlight_level' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:69: error: 'struct radeon_encoder_lvds' has no member named 'bl_dev' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:70: error: 'struct radeon_encoder_lvds' has no member named 'backlight_level' drivers/gpu/drm/radeon/radeon_legacy_encoders.c: In function 'radeon_legacy_lvds_dpms': drivers/gpu/drm/radeon/radeon_legacy_encoders.c:144: error: 'struct radeon_encoder_atom_dig' has no member named 'dpms_mode' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:147: error: 'struct radeon_encoder_lvds' has no member named 'dpms_mode' ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] change acquire/release_console_sem() to console_lock/unlock()
On Thu, 20 Jan 2011 17:55:02 +0100 torbenh torb...@gmx.de wrote: On Thu, Jan 20, 2011 at 08:34:48AM -0800, Greg KH wrote: On Thu, Jan 20, 2011 at 04:58:13PM +0100, Torben Hohn wrote: the -rt patches change the console_semaphore to console_mutex. so a quite large chunk of the patches changes all acquire/release_console_sem() to acquire/release_console_mutex() Why not just change the functionality of the existing function to be a mutex in the rt patches, instead of having to rename it everywhere? i hope that Thomas already did this in his upcoming -rt series. this commit makes things use more neutral function names which dont make implications about the underlying lock. the only real change is the return value of console_trylock which is inverted from try_acquire_console_sem() Signed-off-by: Torben Hohn torb...@gmx.de CC: Thomas Gleixner t...@tglx.de I don't mind this rename, but is it really going to help anything out? What's the odds of the -rt portion of this patch ever making it to mainline? the -rt portion only changes the semaphore to a mutex. since the console_sem is used with mutex semantics, i dont see any reason, not to merge that portion too. i am just trying to shrink the -rt patch to make it more maintanable :) Yeah, I think it's a better name and if we can indeed switch that semaphore to a mutex then that's a good thing to do. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 2/5] i915: Add native backlight control
On Fri, 21 Jan 2011 00:43:59 + Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Jan 20, 2011 at 03:13:49PM -0800, Andrew Morton wrote: On Fri, 21 Jan 2011 00:43:46 +0330 Ali Gholami Rudi aliqr...@gmail.com wrote: Ali Gholami Rudi aliqr...@gmail.com wrote: I tried to apply this patch but I get: drivers/gpu/drm/i915/intel_panel.c: In function 'intel_panel_setup_backlight': drivers/gpu/drm/i915/intel_panel.c:319: error: 'struct backlight_properties' has no member named 'type' drivers/gpu/drm/i915/intel_panel.c:319: error: 'BACKLIGHT_RAW' undeclared (first use in this function) drivers/gpu/drm/i915/intel_panel.c:319: error: (Each undeclared identifier is reported only once drivers/gpu/drm/i915/intel_panel.c:319: error: for each function it appears in.) After applying patch 1/5, the patch applied cleanly. Now I can change the brightness using /sys/class/backlight/intel_backlight/brightness. So it does fix my issue. So we have a patch ordering issue and the radeon-expose-backlight-class-device-for-legacy-lvds-encoder.patch build error. He applied 2/5, it didn't build, he applied 1/5 and it built? I don't think that's a patch ordering issue :) What, you expect reading skills? I think Michel's patch should fix the Radeon one. If not, can you drop that and keep the rest of the set? I'm travelling at the moment and won't have proper build access until the weekend. OK, I resurrected everything. I updated the new drivers/video/backlight/adp5520_bl.c, but there's a decent chance that unconverted drivers will sneak in. I trust they will still work OK? ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Intel-gfx] [PATCH 2/5] i915: Add native backlight control
On Fri, 21 Jan 2011 06:36:54 +0100 Sedat Dilek sedat.di...@googlemail.com wrote: ( Original posting from [1] ) I have the backlight-type patchset for months in my patch-series (and maintained them if necessary against daily linux-next). Also the last series from 14-Jan-2011 (see 1-5 patch from [2] and the following ones at dri-devel ML). If you couldn't find the updated v2 radeon-backlight-type patch, here the extract from my patch-series: ... # Patch from https://patchwork.kernel.org/patch/491351/ + backlight-type/v2-drm-radeon-kms-Expose-backlight-class-device-for-legacy-LVDS-encoder.patch ... I can only speak for the radeon(-KMS) part with CONFIG_BACKLIGHT_CLASS_DEVICE=y set and against linux-next: It worked, it works. I am a bit angry that someone of the big 5 gets immediately an answer where mine is ignored [3]. Well, who were they sent to? If it was dri-devel then perhaps the people there felt it was inappropriate to their tree, or they were all busy fixing 100 regressions. If it was Richard then he's been distracted by other things for some time, which is why I recently started handling backlight and leds patches. If it was me then kick me, but I don't think it was. In general, if you think that patches aren't getting attention then let me know and send them to me - harassing people for you is part of my job description. So dear Mr. AKPM, if you can standstill for a few moments to answer only one of my questions, through which kernel-tree will this patchset walk trough? leds and backlight patches are presently getting merged into my tree and I'm sending them into Linus. I make sure that Richard sees them all and when he finds time he helps review them for us. They should turn up in linux-next too - we're working on that. This particular patch series is theoretically a bit late for 2.6.38, but if anyone thinks that's a wrong decision, feel free to explain why. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] change acquire/release_console_sem() to console_lock/unlock()
On Fri, 21 Jan 2011 09:10:06 +0100 Geert Uytterhoeven ge...@linux-m68k.org wrote: include/linux/mutex.h: /* * NOTE: mutex_trylock() follows the spin_trylock() convention, * not the down_trylock() convention! * * Returns 1 if the mutex has been acquired successfully, and 0 on contention. */ extern int mutex_trylock(struct mutex *lock); So that's why the return value was inverted (when treating it as a boolean). I can understand that. However: +/** + * console_trylock - try to lock the console system for exclusive use. + * + * Tried to acquire a lock which guarantees that the caller has + * exclusive access to the console system and the console_drivers list. + * + * returns -1 on success, and 0 on failure to acquire the lock. + */ +int console_trylock(void) So this one returns -1 on success, not 1? Why? Yup. All callers just test for non-zero, so... --- a/kernel/printk.c~change-acquire-release_console_sem-to-console_lock-unlock-fix-2 +++ a/kernel/printk.c @@ -1058,7 +1058,7 @@ EXPORT_SYMBOL(console_lock); * Tried to acquire a lock which guarantees that the caller has * exclusive access to the console system and the console_drivers list. * - * returns -1 on success, and 0 on failure to acquire the lock. + * returns 1 on success, and 0 on failure to acquire the lock. */ int console_trylock(void) { @@ -1070,7 +1070,7 @@ int console_trylock(void) } console_locked = 1; console_may_schedule = 0; - return -1; + return 1; } EXPORT_SYMBOL(console_trylock); _ ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)
(cc dri-devel) A post-2.6.37 regression. On Sun, 27 Feb 2011 10:10:41 +0100 Paolo Ornati orn...@gmail.com wrote: Today I got this while starting a video in SMplayer (MPlayer) with 2.6.38-rc6-00113-g4662db4: [ 830.880014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 830.880736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 174895 at 174857, next 174896) [ 830.881093] [drm:init_ring_common] *ERROR* render ring initialization failed ctl head tail start [ 831.379079] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling [ 831.399099] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling ... [ 837.392012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 837.392038] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175016 at 174857, next 175022) [ 837.392491] [drm:init_ring_common] *ERROR* render ring initialization failed ctl head tail start [ 837.537479] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling [ 837.543285] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling ... [ 839.040011] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 839.040034] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 175022 at 174857, next 175122) [ 839.040364] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [ 839.040367] [drm:i915_reset] *ERROR* Failed to reset chip. Screen was almost freezed, cursor was stuck but the machine was alive and I was able to use SysRq to kill X and try to restart it (but that didn't help). I don't remember anything similar in recent kernels (= 2.6.37) and got this only once with 2.6.38-rcX. Environment at the time of the GPU crash: KDE4 (without Desktop Effects) Chromium Claws Mail Dolphin ccached make -j3 on a just pulled linux-tree (so I/O bound) SMPlayer/Mplayer (just launched) Assorted logs attached. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: 2.6.37.2 : Font corruption and [drm:i915_gem_object_unbind] *ERROR* Attempting to unbind pinned buffer
(cc dri-devel) On Mon, 28 Feb 2011 12:26:08 +0100 Paul Rolland r...@as2917.net wrote: Hello, I'm using 2.6.37.2 and I'm getting loads of this error in my messages, while at the same time I can observe font display corruption in a GTK application (Claws-Mail) while running X, compiz and this app. I was previously using 2.6.36 and never experienced such a problem. Video is : 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) I can provide additional information as needed (such a .config or details on userland env. if asked for precise details). If this message warns about a badly acting userland, then why was it ok with 2.6.36 and the same userland env. ? Best, Paul -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: + drivers-base-platformc-dont-mark-platform_device_register_resndata-as-__init_or_module.patch added to -mm tree
On Thu, 19 May 2011 08:21:36 +0200 Uwe Kleine-König u.kleine-koe...@pengutronix.de wrote: I'm not sure that the things that radeon_cp_init does are sane. Some more details here might help someone fix it. Maybe add a comment that it is the only known stopper to make platform_device_register_resndata __init_or_module and a similar comment to platform_device_register_resndata itself? Send patch :) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: avoid switching to text console if there is no panic timeout
On Thu, 10 Nov 2011 13:15:04 -0800 Mandeep Singh Baines m...@chromium.org wrote: David Rientjes (rient...@google.com) wrote: On Mon, 17 Oct 2011, David Rientjes wrote: On Mon, 17 Oct 2011, Mandeep Singh Baines wrote: From: Hugh Dickins hu...@chromium.org Add a check for panic_timeout in the drm_fb_helper_panic() notifier: if we're going to reboot immediately, the user will not be able to see the messages anyway, and messing with the video mode may display artifacts, and certainly get into several layers of complexity (including mutexes and memory allocations) which we shall be much safer to avoid. Signed-off-by: Hugh Dickins hu...@google.com [ Edited commit message and modified to short-circuit panic_timeout 0 instead of testing panic_timeout = 0. -Mandeep ] Signed-off-by: Mandeep Singh Baines m...@chromium.org Cc: Dave Airlie airl...@redhat.com Cc: Andrew Morton a...@linux-foundation.org Cc: dri-devel@lists.freedesktop.org Acked-by: David Rientjes rient...@google.com Dave, where do we stand on this? I haven't seen it hit Linus' tree and I don't see it in git://people.freedesktop.org/~airlied/linux. The last status I have is Andrew pulling it into mmotm on 10/18/11. Subject: + drm-avoid-switching-to-text-console-if-there-is-no-panic-timeout.patch added to -mm tree From: a...@linux-foundation.org Date: Tue, 18 Oct 2011 15:42:46 -0700 The patch titled Subject: drm: avoid switching to text console if there is no panic timeout has been added to the -mm tree. Its filename is drm-avoid-switching-to-text-console-if-there-is-no-panic-timeout.patch I need to do another round of sending patches to maintainers. It's a depressing exercise because the great majority of patches are simply ignored. Last time I even added please don't ignore to the email Subject: on the more important ones. Sigh. Where is mmotm hosted these days? On my disk, until kernel.org ftp access returns. But I regularly email tarballs to Stephen, so it's all in linux-next. The mmotm tree is largely unneeded now - use linux-next to get at -mm patches. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: 3.2-rc2+: Reported regressions from 3.0 and 3.1
On Tue, 22 Nov 2011 11:19:24 +0530 Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: Subject: khugepaged blocks suspend2ram (3.2.0-rc1-00252-g8f042aa) Submitter : Sergei Trofimovich sly...@gmail.com Date : 2011-11-12 10:48 Message-ID : 2012104859.7744b...@sf.home References : https://lkml.org/lkml/2011/11/12/11 Andrea's patch already fixes this issue, which was reported first by Jiri Slaby. https://lkml.org/lkml/2011/11/11/93 I remember Andrew Morton taking this patch in his -mm tree. But it is not in mainline yet. So can we consider this closed or not? grr, nothing in that patch's changelog indicates that it fixed a regression nor that it fixed an infinite blockage of suspend. I moved it to my 3.2 queue, thanks. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 16 Feb 2012 13:01:36 +0100 Daniel Vetter daniel.vet...@ffwll.ch wrote: drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. I've checked the callsites and they all already clamp size when calling fault_in_pages_* to the same as for the subsequent __copy_to|from_user and hence don't rely on the implicit clamping to PAGE_SIZE. Also kill a copypasted spurious space in both functions while at it. ... --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); static inline int fault_in_pages_writeable(char __user *uaddr, int size) { int ret; + char __user *end = uaddr + size - 1; if (unlikely(size == 0)) return 0; @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - ret = __put_user(0, uaddr); + while (uaddr = end) { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } The callsites in filemap.c are pretty hot paths, which is why this thing remains explicitly inlined. I think it would be worth adding a bit of code here to avoid adding a pointless test-n-branch and larger cache footprint to read() and write(). A way of doing that is to add another argument to these functions, say bool multipage. Change the code to do if (multipage) { while (uaddr = end) { ... } } and change the callsites to pass in constant true or false. Then compile it up and manually check that the compiler completely removed the offending code from the filemap.c callsites. Wanna have a think about that? If it all looks OK then please be sure to add code comments explaining why we did this. if (ret == 0) { - char __user *end = uaddr + size - 1; - /* * If the page was already mapped, this will get a cache miss * for sure, so try to avoid doing it. */ - if (((unsigned long)uaddr PAGE_MASK) != + if (((unsigned long)uaddr PAGE_MASK) == ((unsigned long)end PAGE_MASK)) Maybe I'm having a dim day, but I don't immediately see why != got turned into ==. Once we have this settled I'd suggest that the patch be carried in whatever-git-tree-needs-it. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Fri, 24 Feb 2012 14:34:31 +0100 Daniel Vetter dan...@ffwll.ch wrote: --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter); static inline int fault_in_pages_writeable(char __user *uaddr, int size) { int ret; + char __user *end = uaddr + size - 1; if (unlikely(size == 0)) return 0; @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - ret = __put_user(0, uaddr); + while (uaddr = end) { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } The callsites in filemap.c are pretty hot paths, which is why this thing remains explicitly inlined. I think it would be worth adding a bit of code here to avoid adding a pointless test-n-branch and larger cache footprint to read() and write(). A way of doing that is to add another argument to these functions, say bool multipage. Change the code to do if (multipage) { while (uaddr = end) { ... } } and change the callsites to pass in constant true or false. Then compile it up and manually check that the compiler completely removed the offending code from the filemap.c callsites. Wanna have a think about that? If it all looks OK then please be sure to add code comments explaining why we did this. I wasn't really happy with the added branch either, but failed to come up with a trick to avoid it. Imho adding new _multipage variants of these functions instead of adding a constant argument is simpler because the functions don't really share much thanks to the block below. I'll see what it looks like (and obviously add a comment explaining what's going on). well... that's just syntactic sugar: static inline int __fault_in_pages_writeable(char __user *uaddr, int size, bool multipage) { ... } static inline int fault_in_pages_writeable(char __user *uaddr, int size) { return __fault_in_pages_writeable(uaddr, size, false); } static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { return __fault_in_pages_writeable(uaddr, size, true); } which I don't think is worth bothering with given the very small number of callsites. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Wed, 29 Feb 2012 15:03:31 +0100 Daniel Vetter daniel.vet...@ffwll.ch wrote: drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. I've checked the callsites and they all already clamp size when calling fault_in_pages_* to the same as for the subsequent __copy_to|from_user and hence don't rely on the implicit clamping to PAGE_SIZE. Also kill a copypasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. I don't think this produced a very good result :( ... -static inline int fault_in_pages_writeable(char __user *uaddr, int size) +static inline int fault_in_pages_writeable(char __user *uaddr, int size, +bool multipage) { int ret; + char __user *end = uaddr + size - 1; if (unlikely(size == 0)) return 0; @@ -416,36 +419,46 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) * Writing zeroes into userspace here is OK, because we know that if * the zero gets there, we'll be overwriting it. */ - ret = __put_user(0, uaddr); - if (ret == 0) { - char __user *end = uaddr + size - 1; + do { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } while (multipage uaddr = end); + if (ret == 0) { /* * If the page was already mapped, this will get a cache miss * for sure, so try to avoid doing it. */ - if (((unsigned long)uaddr PAGE_MASK) != + if (((unsigned long)uaddr PAGE_MASK) == ((unsigned long)end PAGE_MASK)) - ret = __put_user(0, end); + ret = __put_user(0, end); } return ret; } One effect of this change for the filemap.c callsite is that `uaddr' now gets incremented by PAGE_SIZE. That happens to not break anything because we then mask `uaddr' with PAGE_MASK, and if gcc were really smart, it could remove that addition. But it's a bit ugly. Ideally the patch would have no effect upon filemap.o size, but with an allmodconfig config I'm seeing textdata bss dec hex filename 22876 1187344 303387682 mm/filemap.o(before) 22925 1187392 3043576e3 mm/filemap.o(after) so we are adding read()/write() overhead, and bss mysteriously got larger. Can we improve on this? Even if it's some dumb static inline int fault_in_pages_writeable(char __user *uaddr, int size, bool multipage) { if (multipage) { do-this } else { do-that } } the code duplication between do-this and do-that is regrettable, but at least it's all in the same place in the same file, so we won't accidentally introduce skew later on. Alternatively, add a separate fault_in_multi_pages_writeable() to pagemap.h. I have a bad feeling this is what your original patch did! (But we *should* be able to make this work! Why did this version of the patch go so wrong?) ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 1 Mar 2012 00:14:53 +0100 Daniel Vetter dan...@ffwll.ch wrote: I'll redo this patch by adding _multipage versions of these 2 functions for i915. OK, but I hope for i915 doesn't mean private to! Put 'em in pagemap.h, for maintenance reasons and because they are generic. Making them inline is a bit sad, but that's OK for now - they have few callsites. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 1 Mar 2012 20:22:59 +0100 Daniel Vetter daniel.vet...@ffwll.ch wrote: drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copypasted spurious space in both functions while at it. ... +/* Multipage variants of the above prefault helpers, useful if more than + * PAGE_SIZE of date needs to be prefaulted. These are separate from the above + * functions (which only handle up to PAGE_SIZE) to avoid clobbering the + * filemap.c hotpaths. */ Like this please: /* * Multipage variants of the above prefault helpers, useful if more than * PAGE_SIZE of date needs to be prefaulted. These are separate from the above * functions (which only handle up to PAGE_SIZE) to avoid clobbering the * filemap.c hotpaths. */ and s/date/data/ +static inline int fault_in_multipages_writeable(char __user *uaddr, int size) +{ + int ret; + const char __user *end = uaddr + size - 1; + + if (unlikely(size == 0)) + return 0; + + /* + * Writing zeroes into userspace here is OK, because we know that if + * the zero gets there, we'll be overwriting it. + */ Yeah, like that. + while (uaddr = end) { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } + + /* Check whether the range spilled into the next page. */ + if (((unsigned long)uaddr PAGE_MASK) == + ((unsigned long)end PAGE_MASK)) + ret = __put_user(0, end); + + return ret; +} + +static inline int fault_in_multipages_readable(const char __user *uaddr, +int size) +{ + volatile char c; + int ret; + const char __user *end = uaddr + size - 1; + + if (unlikely(size == 0)) + return 0; + + while (uaddr = end) { + ret = __get_user(c, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } + + /* Check whether the range spilled into the next page. */ + if (((unsigned long)uaddr PAGE_MASK) == + ((unsigned long)end PAGE_MASK)) { + ret = __get_user(c, end); + (void)c; + } + + return ret; +} Please merge it via the DRI tree. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3] scatterlist: add sg_alloc_table_from_pages function
On Tue, 08 May 2012 11:50:33 +0200 Tomasz Stanislawski t.stanisl...@samsung.com wrote: This patch adds a new constructor for an sg table. The table is constructed from an array of struct pages. All contiguous chunks of the pages are merged into a single sg nodes. A user may provide an offset and a size of a buffer if the buffer is not page-aligned. The function is dedicated for DMABUF exporters which often perform conversion from an page array to a scatterlist. Moreover the scatterlist should be squashed in order to save memory and to speed-up the process of DMA mapping using dma_map_sg. The code is based on the patch 'v4l: vb2-dma-contig: add support for scatterlist in userptr mode' and hints from Laurent Pinchart. ... /** + * sg_alloc_table_from_pages - Allocate and initialize an sg table from + * an array of pages + * @sgt: The sg table header to use + * @pages: Pointer to an array of page pointers + * @n_pages: Number of pages in the pages array + * @offset: Offset from start of the first page to the start of a buffer + * @size: Number of valid bytes in the buffer (after offset) + * @gfp_mask:GFP allocation mask + * + * Description: + *Allocate and initialize an sg table from a list of pages. Continuous s/Continuous/Contiguous/ + *ranges of the pages are squashed into a single scatterlist node. A user + *may provide an offset at a start and a size of valid data in a buffer + *specified by the page array. The returned sg table is released by + *sg_free_table. + * + * Returns: + * 0 on success, negative error on failure + **/ nit: Use */, not **/ here. +int sg_alloc_table_from_pages(struct sg_table *sgt, + struct page **pages, unsigned int n_pages, + unsigned long offset, unsigned long size, + gfp_t gfp_mask) I guess a 32-bit n_pages is OK. A 16TB IO seems enough ;) +{ + unsigned int chunks; + unsigned int i; erk, please choose a different name for this. When a C programmer sees i, he very much assumes it has type int. Making it unsigned causes surprise. And don't rename it to u! Let's give it a nice meaningful name. pageno? + unsigned int cur_page; + int ret; + struct scatterlist *s; + + /* compute number of contiguous chunks */ + chunks = 1; + for (i = 1; i n_pages; ++i) + if (page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1) This assumes that if two pages have contiguous pfn's then they are physically contiguous. Is that true for all architectures and memory models, including sparsemem? See sparse_encode_mem_map(). + ++chunks; + + ret = sg_alloc_table(sgt, chunks, gfp_mask); + if (unlikely(ret)) + return ret; + + /* merging chunks and putting them into the scatterlist */ + cur_page = 0; + for_each_sg(sgt-sgl, s, sgt-orig_nents, i) { + unsigned long chunk_size; + unsigned int j; j is an int, too. + + /* looking for the end of the current chunk */ s/looking/look/ + for (j = cur_page + 1; j n_pages; ++j) + if (page_to_pfn(pages[j]) != + page_to_pfn(pages[j - 1]) + 1) + break; + + chunk_size = ((j - cur_page) PAGE_SHIFT) - offset; + sg_set_page(s, pages[cur_page], min(size, chunk_size), offset); + size -= chunk_size; + offset = 0; + cur_page = j; + } + + return 0; +} +EXPORT_SYMBOL(sg_alloc_table_from_pages); ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3] scatterlist: add sg_alloc_table_from_pages function
On Mon, 21 May 2012 16:01:50 +0200 Tomasz Stanislawski t.stanisl...@samsung.com wrote: +int sg_alloc_table_from_pages(struct sg_table *sgt, + struct page **pages, unsigned int n_pages, + unsigned long offset, unsigned long size, + gfp_t gfp_mask) I guess a 32-bit n_pages is OK. A 16TB IO seems enough ;) Do you think that 'unsigned long' for offset is too big? Ad n_pages. Assuming that Moore's law holds it will take circa 25 years before the limit of 16 TB is reached :) for high-end scatterlist operations. Or I can change the type of n_pages to 'unsigned long' now at no cost :). By then it will be Someone Else's Problem ;) +{ + unsigned int chunks; + unsigned int i; erk, please choose a different name for this. When a C programmer sees i, he very much assumes it has type int. Making it unsigned causes surprise. And don't rename it to u! Let's give it a nice meaningful name. pageno? The problem is that 'i' is a natural name for a loop counter. It's also the natural name for an integer. If a C programmer sees i, he thinks int. It's a Fortran thing ;) AFAIK, in the kernel code developers try to avoid Hungarian notation. A name of a variable should reflect its purpose, not its type. I can change the name of 'i' to 'pageno' and 'j' to 'pageno2' (?) but I think it will make the code less reliable. Well, one could do something radical such as using p. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] backlight: initialize struct backlight_properties properly
On Mon, 21 May 2012 09:24:35 +0200 Corentin Chary corentin.ch...@gmail.com wrote: In all these files, the .power field was never correctly initialized. ... --- a/drivers/gpu/drm/i915/intel_panel.c +++ b/drivers/gpu/drm/i915/intel_panel.c @@ -342,6 +342,7 @@ int intel_panel_setup_backlight(struct drm_device *dev) else return -ENODEV; + memset(props, 0, sizeof(props)); This code is all still quite fragile. What happens if we add a new field to backlight_properties? We need to go edit a zillion drivers? There are two ways of fixing this: a) write and use void backlight_properties_init(struct backlight_properties *props, int type, int max_brightness, etc); or b) edit all sites to do struct backlight_properties bl_props = { .type = BACKLIGHT_RAW, .max_brighness = intel_panel_get_max_backlight(dev), .power = FB_BLANK_UNBLANK, }; they're largely equivalent - the struct will be zeroed out and then certain fields will be initialised. a) is a bit better because some future field may not want the all-zeroes pattern (eg, a spinlock). But they're both better than what we have now! ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: linux-next: Tree for Nov 14 (gpu/drm/i915)
On Wed, 14 Nov 2012 11:41:49 -0800 Randy Dunlap rdun...@infradead.org wrote: On 11/13/2012 09:30 PM, Stephen Rothwell wrote: Hi all, News: next-20121115 (i.e. tomorrow) will be the last release until next-20121126 (which should be just be after -rc7, I guess - assuming that Linus does not release v3.7 before then), so if you want something in linux-next for a reasonable amount of testing, it should probably be committed tomorrow. Changes since 20121113: on i386: ERROR: __build_bug_on_failed [drivers/gpu/drm/i915/i915.ko] undefined! Reference to that symbol is found in i915_gem_execbuffer.o. Reference to BUILD_BUG_ON() is found in i915_gem_execbuffer.c: static struct eb_objects * eb_create(int size) { struct eb_objects *eb; int count = PAGE_SIZE / sizeof(struct hlist_head) / 2; BUILD_BUG_ON(!is_power_of_2(PAGE_SIZE / sizeof(struct hlist_head))); Where to start. - eb_create() has no business assuming that the hlist_head has any particular size. We could easily add some conditionally-compiled debug fields in there, and drm blows up. - The assertion is obviously true at present, so I assume that gcc screwed up and failed to reduce all that to a compile-time constant. - We have a BUILD_BUG_ON_NOT_POWER_OF_2(). Use it? - This code is using PAGE_SIZE to scale a kernel data structure. PAGE_SIZE can vary by a factor of 16, depending on Kconfig. This can result in 64k PAGE_SIZE machines exhibiting different beahviour from that which the testers saw. Don't do it. It's better to hard-wire 4096. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] fbcon locking fixes.
On Fri, 25 Jan 2013 00:42:37 + (GMT) Dave Airlie airl...@linux.ie wrote: These patches have been sailing around long enough, waiting for a maintainer to reappear, so I've decided enough is enough, lockdep is kinda useful to have. Thanks to Daniel for annoying me enough :-) Me too, but the patches broke Maarten's EFI driver setup: https://lkml.org/lkml/2013/1/15/203. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH v2 1/2] mm: fix cache mode of dax pmd mappings
On Wed, 07 Sep 2016 15:26:14 -0700 Dan Williams wrote: > track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as > uncacheable rendering them impractical for application usage. DAX-pte > mappings are cached and the goal of establishing DAX-pmd mappings is to > attain more performance, not dramatically less (3 orders of magnitude). > > track_pfn_insert() relies on a previous call to reserve_memtype() to > establish the expected page_cache_mode for the range. While memremap() > arranges for reserve_memtype() to be called, devm_memremap_pages() does > not. So, teach track_pfn_insert() and untrack_pfn() how to handle > tracking without a vma, and arrange for devm_memremap_pages() to > establish the write-back-cache reservation in the memtype tree. Acked-by: Andrew Morton I'll grab [2/2].
[PATCH v4 00/13] Support non-lru page migration
On Wed, 27 Apr 2016 16:48:13 +0900 Minchan Kim wrote: > Recently, I got many reports about perfermance degradation in embedded > system(Android mobile phone, webOS TV and so on) and easy fork fail. > > The problem was fragmentation caused by zram and GPU driver mainly. > With memory pressure, their pages were spread out all of pageblock and > it cannot be migrated with current compaction algorithm which supports > only LRU pages. In the end, compaction cannot work well so reclaimer > shrinks all of working set pages. It made system very slow and even to > fail to fork easily which requires order-[2 or 3] allocations. > > Other pain point is that they cannot use CMA memory space so when OOM > kill happens, I can see many free pages in CMA area, which is not > memory efficient. In our product which has big CMA memory, it reclaims > zones too exccessively to allocate GPU and zram page although there are > lots of free space in CMA so system becomes very slow easily. > > To solve these problem, this patch tries to add facility to migrate > non-lru pages via introducing new functions and page flags to help > migration. I'm seeing some rejects here against Mel's changes and our patch bandwidth is getting waaay way ahead of our review bandwidth. So I think I'll loadshed this patchset at this time, sorry.
[PATCH v7 00/12] Support non-lru page migration
On Wed, 1 Jun 2016 08:21:09 +0900 Minchan Kim wrote: > Recently, I got many reports about perfermance degradation in embedded > system(Android mobile phone, webOS TV and so on) and easy fork fail. > > The problem was fragmentation caused by zram and GPU driver mainly. > With memory pressure, their pages were spread out all of pageblock and > it cannot be migrated with current compaction algorithm which supports > only LRU pages. In the end, compaction cannot work well so reclaimer > shrinks all of working set pages. It made system very slow and even to > fail to fork easily which requires order-[2 or 3] allocations. > > Other pain point is that they cannot use CMA memory space so when OOM > kill happens, I can see many free pages in CMA area, which is not > memory efficient. In our product which has big CMA memory, it reclaims > zones too exccessively to allocate GPU and zram page although there are > lots of free space in CMA so system becomes very slow easily. But this isn't presently implemented for GPU drivers or for CMA, yes? What's the story there?
[PATCH] tree-wide: replace config_enabled() with IS_ENABLED()
On Mon, 6 Jun 2016 21:20:56 +0900 Masahiro Yamada wrote: > The use of config_enabled() against config options is ambiguous. > In practical terms, config_enabled() is equivalent to IS_BUILTIN(), > but the author might have used it for the meaning of IS_ENABLED(). > Using IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc. makes the > intention clearer. > > This commit replaces config_enabled() with IS_ENABLED() where > possible. This commit is only touching bool config options. > > I noticed two cases where config_enabled() is used against a tristate > option: > > - config_enabled(CONFIG_HWMON) > [ drivers/net/wireless/ath/ath10k/thermal.c ] > > - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE) > [ drivers/gpu/drm/gma500/opregion.c ] > > I did not touch them because they should be converted to IS_BUILTIN() > in order to keep the logic, but I was not sure it was the authors' > intention. Well, we do want to be able to remove config_enabled() altogether if we're going to do this. So please later send along a patch which makes a best-effort fix of the unclear usages and let's zap the thing. If those fixes weren't quite correct then there will be a build error (drivers/net/wireless/ath/ath10k/thermal.c) or no change in behaviour (drivers/gpu/drm/gma500/opregion.c).
[RFC 00/10] implement alternative and much simpler id allocator
On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes wrote: > TL;DR: these patches save 250 KB of memory, with more low-hanging > fruit ready to pick. > > While browsing through the lib/idr.c code, I noticed that the code at > the end of ida_get_new_above() probably doesn't work as intended: Most > users of ida use it via ida_simple_get(), and that starts by > unconditionally calling ida_pre_get(), ensuring that ida->idr has > 8==MAX_IDR_FREE idr_layers in its free list id_free. In the common > case, none (or at most one) of these get used during > ida_get_new_above(), and we only free one, leaving at least 6 (usually > 7) idr_layers in the free list. Please be aware of http://ozlabs.org/~akpm/mmots/broken-out/reimplement-idr-and-ida-using-the-radix-tree.patch http://lkml.kernel.org/r/1480369871-5271-68-git-send-email-mawilcox at linuxonhyperv.com I expect we'll be merging patches 1-32 of that series into 4.10-rc1 and the above patch (#33) into 4.11-rc1.
[PATCH] fbdev: remove current maintainer
On Thu, 8 Dec 2016 10:34:12 +0200 Tomi Valkeinen wrote: > Remove Tomi Valkeinen from fbdev maintainer and mark fbdev as orphan. > > ... > > I just don't have time to even pretend I maintain fbdev, so mark it as Orphan. > > Anyone want to pick fbdev maintainership? I'll keep an eye on the mailing list, try to ensure that things keep ticking along.
[PATCH] fbdev: remove current maintainer
On Mon, 19 Dec 2016 14:34:12 +0100 Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Thursday, December 15, 2016 12:12:52 PM Andrew Morton wrote: > > On Thu, 8 Dec 2016 10:34:12 +0200 Tomi Valkeinen > > wrote: > > > > > Remove Tomi Valkeinen from fbdev maintainer and mark fbdev as orphan. > > > > > > ... > > > > > > I just don't have time to even pretend I maintain fbdev, so mark it as > > > Orphan. > > > > > > Anyone want to pick fbdev maintainership? > > http://marc.info/?l=linux-fbdev=148181317415549=2 > > > I'll keep an eye on the mailing list, try to ensure that things keep > > ticking along. > > Andrew, do you want to manage patches yourself or should I step > in and try to do it? No, please go ahead.
[PATCH 28/83] mm: Change timing of notification to IOMMUs about a page to be invalidated
On Fri, 11 Jul 2014 00:53:26 +0300 Oded Gabbay wrote: > From: Andrew Lewycky > > This patch changes the location of the mmu_notifier_invalidate_page function > call inside try_to_unmap_one. The mmu_notifier_invalidate_page function > call tells the IOMMU that a pgae should be invalidated. > > The location is changed from after releasing the physical page to > before releasing the physical page. > > This change should prevent the bug that would occur in the > (rare) case where the GPU attempts to access a page while the CPU > attempts to swap out that page (or discard it if it is not dirty). um OK, but what is the effect on all the other mmu_notifier_ops.invalidate_page() implementations? Please spell this out in full detail within the changelog and be sure to cc the affected maintainers.
[PATCH] list: Fix order of arguments for hlist_add_after(_rcu)
On Fri, 6 Jun 2014 10:22:51 -0700 "Paul E. McKenney" wrote: > On Fri, Jun 06, 2014 at 03:56:52PM +, David Laight wrote: > > From: Behalf Of Ken Helias > > > All other add functions for lists have the new item as first argument and > > > the > > > position where it is added as second argument. This was changed for no > > > good > > > reason in this function and makes using it unnecessary confusing. > > > > > > Also the naming of the arguments in hlist_add_after was confusing. It was > > > changed to use the same names as hlist_add_after_rcu. > > ... > > > -static inline void hlist_add_after_rcu(struct hlist_node *prev, > > > -struct hlist_node *n) > > > +static inline void hlist_add_after_rcu(struct hlist_node *n, > > > +struct hlist_node *prev) > > > > It is rather a shame that the change doesn't generate a compilation > > error for old source files. > > I am also a bit concerned by this. > yup. hlist_add_behind()?
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 06 Nov 2014 17:28:41 + bugzilla-daemon at bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=87891 > > Bug ID: 87891 >Summary: kernel BUG at mm/slab.c:2625! >Product: Memory Management >Version: 2.5 > Kernel Version: 3.17.2 > Hardware: i386 > OS: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Slab Allocator > Assignee: akpm at linux-foundation.org > Reporter: luke-jr+linuxbugs at utopios.org > Regression: No Well this is interesting. > [359782.842112] kernel BUG at mm/slab.c:2625! > ... > [359782.843008] Call Trace: > [359782.843017] [] __kmalloc+0xdf/0x200 > [359782.843037] [] ? ttm_page_pool_free+0x35/0x180 [ttm] > [359782.843060] [] ttm_page_pool_free+0x35/0x180 [ttm] > [359782.843084] [] ttm_pool_shrink_scan+0xae/0xd0 [ttm] > [359782.843108] [] shrink_slab_node+0x12b/0x2e0 > [359782.843129] [] ? fragmentation_index+0x14/0x70 > [359782.843150] [] ? zone_watermark_ok+0x1a/0x20 > [359782.843171] [] shrink_slab+0xc8/0x110 > [359782.843189] [] do_try_to_free_pages+0x300/0x410 > [359782.843210] [] try_to_free_pages+0xbb/0x190 > [359782.843230] [] __alloc_pages_nodemask+0x696/0xa90 > [359782.843253] [] do_huge_pmd_anonymous_page+0xfa/0x3f0 > [359782.843278] [] ? debug_smp_processor_id+0x17/0x20 > [359782.843300] [] ? __lru_cache_add+0x57/0xa0 > [359782.843321] [] handle_mm_fault+0x37e/0xdd0 It went pagefault ->__alloc_pages_nodemask ->shrink_slab ->ttm_pool_shrink_scan ->ttm_page_pool_free ->kmalloc ->cache_grow ->BUG_ON(flags & GFP_SLAB_BUG_MASK); And I don't really know why - I'm not seeing anything in there which can set a GFP flag which is outside GFP_SLAB_BUG_MASK. However I see lots of nits. Core MM: __alloc_pages_nodemask() does if (unlikely(!page)) { /* * Runtime PM, block IO and its error handling path * can deadlock because I/O on the device might not * complete. */ gfp_mask = memalloc_noio_flags(gfp_mask); page = __alloc_pages_slowpath(gfp_mask, order, zonelist, high_zoneidx, nodemask, preferred_zone, classzone_idx, migratetype); } so it permanently alters the value of incoming arg gfp_mask. This means that the following trace_mm_page_alloc() will print the wrong value of gfp_mask, and if we later do the `goto retry_cpuset', we retry with a possibly different gfp_mask. Isn't this a bug? Also, why are we even passing a gfp_t down to the shrinkers? So they can work out the allocation context - things like __GFP_IO, __GFP_FS, etc? Is it even appropriate to use that mask for a new allocation attempt within a particular shrinker? ttm: I think it's a bad idea to be calling kmalloc() in the slab shrinker function. We *know* that the system is low on memory and is trying to free things up. Trying to allocate *more* memory at this time is asking for trouble. ttm_page_pool_free() could easily be tweaked to use a fixed-size local array of page*'s t avoid that allocation. Could someone implement this please? slab: There's no point in doing #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. What's it trying to do here? And it's quite infuriating to go BUG when the code could easily warn and fix it up. And it's quite infuriating to go BUG because one of the bits was set, but not tell us which bit it was! Could the slab guys please review this? From: Andrew Morton <a...@linux-foundation.org> Subject: slab: improve checking for invalid gfp_flags - The code goes BUG, but doesn't tell us which bits were unexpectedly set. Print that out. - The code goes BUG when it could jsut fix things up and proceed. Do that. - ~__GFP_BITS_MASK already includes __GFP_DMA32 and __GFP_HIGHMEM, so remove those from the GFP_SLAB_BUG_MASK definition. Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton --- include/linux/gfp.h |2 +- mm/slab.c |5 - mm/slub.c |5 - 3 files changed, 9 insertions(+), 3 deletions(-) diff -puN include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags include/linux/gfp.h --- a/include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags +++ a/include/linux/gfp.h @@ -145,7 +145,7 @@ str
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter wrote: > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > There's no point in doing > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. Ah, yes, OK. I suppose it's possible that __GFP_HIGHMEM was set. do_huge_pmd_anonymous_page ->pte_alloc_one ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) but I haven't traced that through and that's 32-bit. But anyway - Luke, please attach your .config to https://bugzilla.kernel.org/show_bug.cgi?id=87891?
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Wed, 12 Nov 2014 09:44:19 +0900 Joonsoo Kim wrote: > > > > And it's quite infuriating to go BUG when the code could easily warn > > and fix it up. > > If user wants memory on HIGHMEM, it can be easily fixed by following > change because all memory is compatible for HIGHMEM. But, if user wants > memory on DMA32, it's not easy to fix because memory on NORMAL isn't > compatible with DMA32. slab could return object from another slab page > even if cache_grow() is successfully called. So BUG_ON() here > looks right thing to me. We cannot know in advance whether ignoring this > flag cause more serious result or not. Well, attempting to fix it up and continue is nice, but we can live with the BUG. Not knowing which bit was set is bad. diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slab.c @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); /* Take the node list lock to change the colour_next on this node */ --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slub.c @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem int order; int idx; - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } page = allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Wed, 12 Nov 2014 00:54:01 + Luke Dashjr wrote: > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > But anyway - Luke, please attach your .config to > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. It might just have been a random memory bitflip or other corruption of course. Is it repeatable at all? If it is, please add the below and retest? --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slab.c @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); /* Take the node list lock to change the colour_next on this node */ diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slub.c @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem int order; int idx; - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } page = allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim wrote: > On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote: > > On Wed, 12 Nov 2014 00:54:01 + Luke Dashjr wrote: > > > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > > > But anyway - Luke, please attach your .config to > > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > > > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > > > > > > > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. > > Hello, Andrew. > > I think that the cause is GFP_HIGHMEM. > GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM. > Please look at the do_huge_pmd_anonymous_page(). > It calls alloc_hugepage_vma() and then alloc_pages_vma() is called > with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE > and then GFP_HIGHUSER_MOVABLE. OK. So where's the bug? I'm inclined to say that it's in ttm. It's taking a gfp_mask which means "this is the allocation attempt which we are attempting to satisfy" and uses that for its own allocation. But ttm has no business using that gfp_mask for its own allocation attempt. If anything it should use something like, err, GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH although as I mentioned earlier, it would be better to avoid allocation altogether. Poor ttm guys - this is a bit of a trap we set for them.
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Wed, 12 Nov 2014 03:47:03 +0200 "Kirill A. Shutemov" wrote: > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter > > linux.com> wrote: > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > There's no point in doing > > > > > > > > > > #define GFP_SLAB_BUG_MASK > > > > > (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of > > > > > ~__GFP_BITS_MASK. > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > Ah, yes, OK. > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > do_huge_pmd_anonymous_page > > > ->pte_alloc_one > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > do_huge_pmd_anonymous_page > > alloc_hugepage_vma > > alloc_pages_vma(GFP_TRANSHUGE) > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > Looks like it's reasonable to sanitize flags in shrink_slab() by dropping > flags incompatible with slab expectation. Like this: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index dcb47074ae03..eb165d29c5e5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control > *shrinkctl, > if (nr_pages_scanned == 0) > nr_pages_scanned = SWAP_CLUSTER_MAX; > > + shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM); > + > if (!down_read_trylock(_rwsem)) { > /* > * If we would return 0, our callers would understand that we Well no, because nobody is supposed to be passing this gfp_mask back into a new allocation attempt anyway. It would be better to do shrinkctl->gfp_mask |= __GFP_IMMEDIATELY_GO_BUG; ?
[Bug 87891] New: kernel BUG at mm/slab.c:2625!
On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa wrote: > Andrew Morton wrote: > > Poor ttm guys - this is a bit of a trap we set for them. > > Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid > deadlock.\") > changed to use sc->gfp_mask rather than GFP_KERNEL. > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), > - GFP_KERNEL); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > > But this bug is caused by sc->gfp_mask containing some flags which are not > in GFP_KERNEL, right? Then, I think > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & > GFP_KERNEL); > > would hide this bug. > > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) Well no - ttm_page_pool_free() should stop calling kmalloc altogether. Just do struct page *pages_to_free[16]; and rework the code to free 16 pages at a time. Easy. Apart from all the other things we're discussing here, it should do this because kmalloc() isn't very reliable within a shrinker. > for > two reasons when __alloc_pages_nodemask() is called from shrinker functions. > > (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow > recursive __alloc_pages_nodemask() calls, kernel stack could overflow > under extreme memory pressure. > > (2) Some shrinker functions are using sleepable locks which could make kswapd > sleep for unpredictable duration. If kswapd is unexpectedly blocked inside > shrinker functions and somebody is expecting that kswapd is running for > reclaiming memory, it is a memory allocation deadlock. > > Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() > to > avoid deadlock inside shrinker functions.\") prevents unlimited recursive > __alloc_pages_nodemask() calls. Yes, there are such problems. Shrinkers do all sorts of surprising things - some of the filesystem ones do disk writes! And these involve all sorts of locking and memory allocations. But they won't be directly using scan_control.gfp_mask. They may be using open-coded __GFP_NOFS for the allocations. The complicated ones pass the IO over to kernel threads and wait for them to complete, which addresses the stack consumption concerns (at least).
[PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 1 Mar 2012 20:22:59 +0100 Daniel Vetter wrote: > drm/i915 wants to read/write more than one page in its fastpath > and hence needs to prefault more than PAGE_SIZE bytes. > > Add new functions in filemap.h to make that possible. > > Also kill a copy spurious space in both functions while at it. > > > ... > > +/* Multipage variants of the above prefault helpers, useful if more than > + * PAGE_SIZE of date needs to be prefaulted. These are separate from the > above > + * functions (which only handle up to PAGE_SIZE) to avoid clobbering the > + * filemap.c hotpaths. */ Like this please: /* * Multipage variants of the above prefault helpers, useful if more than * PAGE_SIZE of date needs to be prefaulted. These are separate from the above * functions (which only handle up to PAGE_SIZE) to avoid clobbering the * filemap.c hotpaths. */ and s/date/data/ > +static inline int fault_in_multipages_writeable(char __user *uaddr, int size) > +{ > + int ret; > + const char __user *end = uaddr + size - 1; > + > + if (unlikely(size == 0)) > + return 0; > + > + /* > + * Writing zeroes into userspace here is OK, because we know that if > + * the zero gets there, we'll be overwriting it. > + */ Yeah, like that. > + while (uaddr <= end) { > + ret = __put_user(0, uaddr); > + if (ret != 0) > + return ret; > + uaddr += PAGE_SIZE; > + } > + > + /* Check whether the range spilled into the next page. */ > + if (((unsigned long)uaddr & PAGE_MASK) == > + ((unsigned long)end & PAGE_MASK)) > + ret = __put_user(0, end); > + > + return ret; > +} > + > +static inline int fault_in_multipages_readable(const char __user *uaddr, > +int size) > +{ > + volatile char c; > + int ret; > + const char __user *end = uaddr + size - 1; > + > + if (unlikely(size == 0)) > + return 0; > + > + while (uaddr <= end) { > + ret = __get_user(c, uaddr); > + if (ret != 0) > + return ret; > + uaddr += PAGE_SIZE; > + } > + > + /* Check whether the range spilled into the next page. */ > + if (((unsigned long)uaddr & PAGE_MASK) == > + ((unsigned long)end & PAGE_MASK)) { > + ret = __get_user(c, end); > + (void)c; > + } > + > + return ret; > +} Please merge it via the DRI tree.
[PATCH v3] scatterlist: add sg_alloc_table_from_pages function
On Mon, 21 May 2012 16:01:50 +0200 Tomasz Stanislawski wrote: > >> +int sg_alloc_table_from_pages(struct sg_table *sgt, > >> + struct page **pages, unsigned int n_pages, > >> + unsigned long offset, unsigned long size, > >> + gfp_t gfp_mask) > > > > I guess a 32-bit n_pages is OK. A 16TB IO seems enough ;) > > > > Do you think that 'unsigned long' for offset is too big? > > Ad n_pages. Assuming that Moore's law holds it will take > circa 25 years before the limit of 16 TB is reached :) for > high-end scatterlist operations. > Or I can change the type of n_pages to 'unsigned long' now at > no cost :). By then it will be Someone Else's Problem ;) > >> +{ > >> + unsigned int chunks; > >> + unsigned int i; > > > > erk, please choose a different name for this. When a C programmer sees > > "i", he very much assumes it has type "int". Making it unsigned causes > > surprise. > > > > And don't rename it to "u"! Let's give it a nice meaningful name. pageno? > > > > The problem is that 'i' is a natural name for a loop counter. It's also the natural name for an integer. If a C programmer sees "i", he thinks "int". It's a Fortran thing ;) > AFAIK, in the kernel code developers try to avoid Hungarian notation. > A name of a variable should reflect its purpose, not its type. > I can change the name of 'i' to 'pageno' and 'j' to 'pageno2' (?) > but I think it will make the code less reliable. Well, one could do something radical such as using "p".
[PATCH] backlight: initialize struct backlight_properties properly
On Mon, 21 May 2012 09:24:35 +0200 Corentin Chary wrote: > In all these files, the .power field was never correctly initialized. > > ... > > --- a/drivers/gpu/drm/i915/intel_panel.c > +++ b/drivers/gpu/drm/i915/intel_panel.c > @@ -342,6 +342,7 @@ int intel_panel_setup_backlight(struct drm_device *dev) > else > return -ENODEV; > > + memset(, 0, sizeof(props)); This code is all still quite fragile. What happens if we add a new field to backlight_properties? We need to go edit a zillion drivers? There are two ways of fixing this: a) write and use void backlight_properties_init(struct backlight_properties *props, int type, int max_brightness, etc); or b) edit all sites to do struct backlight_properties bl_props = { .type = BACKLIGHT_RAW, .max_brighness = intel_panel_get_max_backlight(dev), .power = FB_BLANK_UNBLANK, }; they're largely equivalent - the struct will be zeroed out and then certain fields will be initialised. a) is a bit better because some future field may not want the all-zeroes pattern (eg, a spinlock). But they're both better than what we have now!
linux-next: Tree for Nov 14 (gpu/drm/i915)
On Wed, 14 Nov 2012 11:41:49 -0800 Randy Dunlap wrote: > On 11/13/2012 09:30 PM, Stephen Rothwell wrote: > > > Hi all, > > > > News: next-20121115 (i.e. tomorrow) will be the last release until > > next-20121126 (which should be just be after -rc7, I guess - assuming > > that Linus does not release v3.7 before then), so if you want something > > in linux-next for a reasonable amount of testing, it should probably be > > committed tomorrow. > > > > Changes since 20121113: > > > > > > on i386: > > ERROR: "__build_bug_on_failed" [drivers/gpu/drm/i915/i915.ko] undefined! > > Reference to that symbol is found in > i915_gem_execbuffer.o. Reference to BUILD_BUG_ON() is found in > i915_gem_execbuffer.c: > > static struct eb_objects * > eb_create(int size) > { > struct eb_objects *eb; > int count = PAGE_SIZE / sizeof(struct hlist_head) / 2; > BUILD_BUG_ON(!is_power_of_2(PAGE_SIZE / sizeof(struct hlist_head))); > Where to start. - eb_create() has no business assuming that the hlist_head has any particular size. We could easily add some conditionally-compiled debug fields in there, and drm blows up. - The assertion is obviously true at present, so I assume that gcc screwed up and failed to reduce all that to a compile-time constant. - We have a BUILD_BUG_ON_NOT_POWER_OF_2(). Use it? - This code is using PAGE_SIZE to scale a kernel data structure. PAGE_SIZE can vary by a factor of 16, depending on Kconfig. This can result in 64k PAGE_SIZE machines exhibiting different beahviour from that which the testers saw. Don't do it. It's better to hard-wire 4096.
[Bug 64521] New: BUG kmalloc-4096 (Tainted: G W ): Poison overwritten
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 06 Nov 2013 19:12:29 + bugzilla-daemon at bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=64521 > > Bug ID: 64521 >Summary: BUG kmalloc-4096 (Tainted: GW ): Poison > overwritten >Product: Memory Management >Version: 2.5 > Kernel Version: 3.11.7 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Slab Allocator > Assignee: akpm at linux-foundation.org > Reporter: mikhail.v.gavrilov at gmail.com > Regression: No > > Created attachment 113671 > --> https://bugzilla.kernel.org/attachment.cgi?id=113671=edit > dmesg output > > $ uname -a > Linux localhost.localdomain 3.11.7-300.fc20.x86_64+debug #1 SMP Mon Nov 4 > 14:54:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > > [14065.872585] perf samples too long (2552 > 2500), lowering > kernel.perf_event_max_sample_rate to 5 > [23825.627427] > = > [23825.627438] BUG kmalloc-4096 (Tainted: GW ): Poison overwritten > [23825.627442] > - > > [23825.627449] INFO: 0x880311f7a5ee-0x880311f7a5ee. First byte 0x7b > instead of 0x6b > [23825.627495] INFO: Allocated in i915_gem_execbuffer2+0x51/0x290 [i915] > age=89 > cpu=6 pid=2995 > [23825.627503] __slab_alloc+0x45f/0x526 > [23825.627509] __kmalloc+0x322/0x3b0 > [23825.627534] i915_gem_execbuffer2+0x51/0x290 [i915] > [23825.627554] drm_ioctl+0x542/0x680 [drm] > [23825.627561] do_vfs_ioctl+0x305/0x530 > [23825.627566] SyS_ioctl+0x81/0xa0 > [23825.627576] tracesys+0xdd/0xe2 > [23825.627601] INFO: Freed in i915_gem_execbuffer2+0xed/0x290 [i915] age=89 > cpu=6 pid=2995 > [23825.627607] __slab_free+0x3a/0x382 > [23825.627611] kfree+0x2c0/0x2f0 > [23825.627632] i915_gem_execbuffer2+0xed/0x290 [i915] > [23825.627649] drm_ioctl+0x542/0x680 [drm] > [23825.627653] do_vfs_ioctl+0x305/0x530 > [23825.627657] SyS_ioctl+0x81/0xa0 > [23825.627663] tracesys+0xdd/0xe2 > [23825.627668] INFO: Slab 0xea000c47de00 objects=7 used=7 fp=0x > (null) flags=0x5ff0004080 > [23825.627672] INFO: Object 0x880311f7a290 @offset=8848 > fp=0x880311f79148 > > [23825.627679] Bytes b4 880311f7a280: d6 ea 65 01 01 00 00 00 5a 5a 5a 5a > 5a 5a 5a 5a ..e. > [23825.627684] Object 880311f7a290: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627688] Object 880311f7a2a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627692] Object 880311f7a2b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627696] Object 880311f7a2c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627700] Object 880311f7a2d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627704] Object 880311f7a2e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627708] Object 880311f7a2f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627711] Object 880311f7a300: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627715] Object 880311f7a310: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627719] Object 880311f7a320: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627723] Object 880311f7a330: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627727] Object 880311f7a340: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627731] Object 880311f7a350: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627735] Object 880311f7a360: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627739] Object 880311f7a370: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627743] Object 880311f7a380: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627747] Object 880311f7a390: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627751] Object 880311f7a3a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627755] Object 880311f7a3b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627759] Object 880311f7a3c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b > [23825.627762] Object 880311f7a3d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > 6b 6b 6b
[git pull] fbcon locking fixes.
On Fri, 25 Jan 2013 00:42:37 + (GMT) Dave Airlie wrote: > These patches have been sailing around long enough, waiting for a maintainer > to reappear, so I've decided enough is enough, lockdep is kinda useful to > have. > > Thanks to Daniel for annoying me enough :-) Me too, but the patches broke Maarten's EFI driver setup: https://lkml.org/lkml/2013/1/15/203.
[PATCH 2/9] mm: Provide new get_vaddr_frames() helper
On Wed, 13 May 2015 15:08:08 +0200 Jan Kara wrote: > Provide new function get_vaddr_frames(). This function maps virtual > addresses from given start and fills given array with page frame numbers of > the corresponding pages. If given start belongs to a normal vma, the function > grabs reference to each of the pages to pin them in memory. If start > belongs to VM_IO | VM_PFNMAP vma, we don't touch page structures. Caller > must make sure pfns aren't reused for anything else while he is using > them. > > This function is created for various drivers to simplify handling of > their buffers. > > Acked-by: Mel Gorman > Acked-by: Vlastimil Babka > Signed-off-by: Jan Kara > --- > include/linux/mm.h | 44 +++ > mm/gup.c | 226 > + That's a lump of new code which many kernels won't be needing. Can we put all this in a new .c file and select it within drivers/media Kconfig?
[PATCH 2/9] mm: Provide new get_vaddr_frames() helper
On Tue, 2 Jun 2015 17:23:00 +0200 Jan Kara wrote: > > That's a lump of new code which many kernels won't be needing. Can we > > put all this in a new .c file and select it within drivers/media > > Kconfig? > So the attached patch should do what you had in mind. OK? lgtm. > drivers/gpu/drm/exynos/Kconfig | 1 + > drivers/media/platform/omap/Kconfig | 1 + > drivers/media/v4l2-core/Kconfig | 1 + > mm/Kconfig | 3 + > mm/Makefile | 1 + > mm/frame-vec.c | 233 > But frame_vector.c would be a more pleasing name. For `struct frame_vector'.
drivers/gpu/drm/i915/i915_gem_gtt.c
I'm not sure what's happened to the drm code in linux-next - it's exploding all over the place. Did someone turn on -Werror without doing anywhere near enough testing? Anyway, I don't know how to fix this i386 build error: drivers/gpu/drm/i915/i915_gem_gtt.c: In function 'gen8_ppgtt_init': drivers/gpu/drm/i915/i915_gem_gtt.c:954:2: error: large integer implicitly truncated to unsigned type [-Werror=overflow] ppgtt->base.total = 1ULL << 32; i915_address_space.total is a ulong: 32-bit.
[Intel-gfx] drivers/gpu/drm/i915/i915_gem_gtt.c
On Sat, 23 May 2015 14:30:09 +0100 Damien Lespiau wrote: > On Fri, May 22, 2015 at 02:17:32PM -0700, Andrew Morton wrote: > > I'm not sure what's happened to the drm code in linux-next - it's > > exploding all over the place. Did someone turn on -Werror without > > doing anywhere near enough testing? > > > > Anyway, I don't know how to fix this i386 build error: > > Seems like you have CONFIG_DRM_I915_WERROR set? Yes. > We explicitely made sure to not enable -Werror by default, `make allmodconfig' enables CONFIG_DRM_I915_WERROR. I'm not sure what is the approved way of fixing this. Perhaps disabling CONFIG_DRM_I915_WERROR when CONFIG_COMPILE_TEST=y.
[PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 16 Feb 2012 13:01:36 +0100 Daniel Vetter wrote: > drm/i915 wants to read/write more than one page in its fastpath > and hence needs to prefault more than PAGE_SIZE bytes. > > I've checked the callsites and they all already clamp size when > calling fault_in_pages_* to the same as for the subsequent > __copy_to|from_user and hence don't rely on the implicit clamping > to PAGE_SIZE. > > Also kill a copy spurious space in both functions while at it. > > ... > > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, > wait_queue_t *waiter); > static inline int fault_in_pages_writeable(char __user *uaddr, int size) > { > int ret; > + char __user *end = uaddr + size - 1; > > if (unlikely(size == 0)) > return 0; > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user > *uaddr, int size) >* Writing zeroes into userspace here is OK, because we know that if >* the zero gets there, we'll be overwriting it. >*/ > - ret = __put_user(0, uaddr); > + while (uaddr <= end) { > + ret = __put_user(0, uaddr); > + if (ret != 0) > + return ret; > + uaddr += PAGE_SIZE; > + } The callsites in filemap.c are pretty hot paths, which is why this thing remains explicitly inlined. I think it would be worth adding a bit of code here to avoid adding a pointless test-n-branch and larger cache footprint to read() and write(). A way of doing that is to add another argument to these functions, say "bool multipage". Change the code to do if (multipage) { while (uaddr <= end) { ... } } and change the callsites to pass in constant "true" or "false". Then compile it up and manually check that the compiler completely removed the offending code from the filemap.c callsites. Wanna have a think about that? If it all looks OK then please be sure to add code comments explaining why we did this. > if (ret == 0) { > - char __user *end = uaddr + size - 1; > - > /* >* If the page was already mapped, this will get a cache miss >* for sure, so try to avoid doing it. >*/ > - if (((unsigned long)uaddr & PAGE_MASK) != > + if (((unsigned long)uaddr & PAGE_MASK) == > ((unsigned long)end & PAGE_MASK)) Maybe I'm having a dim day, but I don't immediately see why != got turned into ==. Once we have this settled I'd suggest that the patch be carried in whatever-git-tree-needs-it.
[PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Fri, 24 Feb 2012 14:34:31 +0100 Daniel Vetter wrote: > > > --- a/include/linux/pagemap.h > > > +++ b/include/linux/pagemap.h > > > @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, > > > wait_queue_t *waiter); > > > static inline int fault_in_pages_writeable(char __user *uaddr, int size) > > > { > > > int ret; > > > + char __user *end = uaddr + size - 1; > > > > > > if (unlikely(size == 0)) > > > return 0; > > > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char > > > __user *uaddr, int size) > > >* Writing zeroes into userspace here is OK, because we know that if > > >* the zero gets there, we'll be overwriting it. > > >*/ > > > - ret = __put_user(0, uaddr); > > > + while (uaddr <= end) { > > > + ret = __put_user(0, uaddr); > > > + if (ret != 0) > > > + return ret; > > > + uaddr += PAGE_SIZE; > > > + } > > > > The callsites in filemap.c are pretty hot paths, which is why this > > thing remains explicitly inlined. I think it would be worth adding a > > bit of code here to avoid adding a pointless test-n-branch and larger > > cache footprint to read() and write(). > > > > A way of doing that is to add another argument to these functions, say > > "bool multipage". Change the code to do > > > > if (multipage) { > > while (uaddr <= end) { > > ... > > } > > } > > > > and change the callsites to pass in constant "true" or "false". Then > > compile it up and manually check that the compiler completely removed > > the offending code from the filemap.c callsites. > > > > Wanna have a think about that? If it all looks OK then please be sure > > to add code comments explaining why we did this. > > I wasn't really happy with the added branch either, but failed to come up > with a trick to avoid it. Imho adding new _multipage variants of these > functions instead of adding a constant argument is simpler because the > functions don't really share much thanks to the block below. I'll see what > it looks like (and obviously add a comment explaining what's going on). well... that's just syntactic sugar: static inline int __fault_in_pages_writeable(char __user *uaddr, int size, bool multipage) { ... } static inline int fault_in_pages_writeable(char __user *uaddr, int size) { return __fault_in_pages_writeable(uaddr, size, false); } static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { return __fault_in_pages_writeable(uaddr, size, true); } which I don't think is worth bothering with given the very small number of callsites.
[PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Wed, 29 Feb 2012 15:03:31 +0100 Daniel Vetter wrote: > drm/i915 wants to read/write more than one page in its fastpath > and hence needs to prefault more than PAGE_SIZE bytes. > > I've checked the callsites and they all already clamp size when > calling fault_in_pages_* to the same as for the subsequent > __copy_to|from_user and hence don't rely on the implicit clamping > to PAGE_SIZE. > > Also kill a copy spurious space in both functions while at it. > > v2: As suggested by Andrew Morton, add a multipage parameter to both > functions to avoid the additional branch for the pagemap.c hotpath. > My gcc 4.6 here seems to dtrt and indeed reap these branches where not > needed. > I don't think this produced a very good result :( > > ... > > -static inline int fault_in_pages_writeable(char __user *uaddr, int size) > +static inline int fault_in_pages_writeable(char __user *uaddr, int size, > +bool multipage) > { > int ret; > + char __user *end = uaddr + size - 1; > > if (unlikely(size == 0)) > return 0; > @@ -416,36 +419,46 @@ static inline int fault_in_pages_writeable(char __user > *uaddr, int size) >* Writing zeroes into userspace here is OK, because we know that if >* the zero gets there, we'll be overwriting it. >*/ > - ret = __put_user(0, uaddr); > - if (ret == 0) { > - char __user *end = uaddr + size - 1; > + do { > + ret = __put_user(0, uaddr); > + if (ret != 0) > + return ret; > + uaddr += PAGE_SIZE; > + } while (multipage && uaddr <= end); > > + if (ret == 0) { > /* >* If the page was already mapped, this will get a cache miss >* for sure, so try to avoid doing it. >*/ > - if (((unsigned long)uaddr & PAGE_MASK) != > + if (((unsigned long)uaddr & PAGE_MASK) == > ((unsigned long)end & PAGE_MASK)) > - ret = __put_user(0, end); > + ret = __put_user(0, end); > } > return ret; > } One effect of this change for the filemap.c callsite is that `uaddr' now gets incremented by PAGE_SIZE. That happens to not break anything because we then mask `uaddr' with PAGE_MASK, and if gcc were really smart, it could remove that addition. But it's a bit ugly. Ideally the patch would have no effect upon filemap.o size, but with an allmodconfig config I'm seeing textdata bss dec hex filename 22876 1187344 303387682 mm/filemap.o(before) 22925 1187392 3043576e3 mm/filemap.o(after) so we are adding read()/write() overhead, and bss mysteriously got larger. Can we improve on this? Even if it's some dumb static inline int fault_in_pages_writeable(char __user *uaddr, int size, bool multipage) { if (multipage) { do-this } else { do-that } } the code duplication between do-this and do-that is regrettable, but at least it's all in the same place in the same file, so we won't accidentally introduce skew later on. Alternatively, add a separate fault_in_multi_pages_writeable() to pagemap.h. I have a bad feeling this is what your original patch did! (But we *should* be able to make this work! Why did this version of the patch go so wrong?)
[PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE
On Thu, 1 Mar 2012 00:14:53 +0100 Daniel Vetter wrote: > I'll redo this patch by adding _multipage versions of these 2 functions > for i915. OK, but I hope "for i915" doesn't mean "private to"! Put 'em in pagemap.h, for maintenance reasons and because they are generic. Making them inline is a bit sad, but that's OK for now - they have few callsites.
[PATCH 3/5] radeon: Expose backlight class device for legacy LVDS encoder
On Fri, 14 Jan 2011 14:24:23 -0500 Matthew Garrett wrote: > From: Michel D__nzer > > Allows e.g. power management daemons to control the backlight level. Inspired > by the corresponding code in radeonfb. > > (Updated to add backlight type and make the connector the parent device - mjg) > x86_64 allmodconfig: drivers/gpu/drm/radeon/radeon_legacy_encoders.c: In function 'radeon_legacy_lvds_update': drivers/gpu/drm/radeon/radeon_legacy_encoders.c:64: error: 'struct radeon_encoder_atom_dig' has no member named 'bl_dev' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:65: error: 'struct radeon_encoder_atom_dig' has no member named 'backlight_level' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:69: error: 'struct radeon_encoder_lvds' has no member named 'bl_dev' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:70: error: 'struct radeon_encoder_lvds' has no member named 'backlight_level' drivers/gpu/drm/radeon/radeon_legacy_encoders.c: In function 'radeon_legacy_lvds_dpms': drivers/gpu/drm/radeon/radeon_legacy_encoders.c:144: error: 'struct radeon_encoder_atom_dig' has no member named 'dpms_mode' drivers/gpu/drm/radeon/radeon_legacy_encoders.c:147: error: 'struct radeon_encoder_lvds' has no member named 'dpms_mode'
[PATCH 1/5] Backlight: Add backlight type
On Fri, 19 Nov 2010 10:53:52 -0500 Matthew Garrett wrote: > There may be multiple ways of controlling the backlight on a given machine. > Allow drivers to expose the type of interface they are providing, making > it possible for userspace to make appropriate policy decisions. > > ... > > 60 files changed, 102 insertions(+), 0 deletions(-) This patch has a pretty short half-life. > > ... > > --- a/include/linux/backlight.h > +++ b/include/linux/backlight.h > @@ -32,6 +32,12 @@ enum backlight_update_reason { > BACKLIGHT_UPDATE_SYSFS, > }; > > +enum backlight_type { > + BACKLIGHT_RAW, > + BACKLIGHT_PLATFORM, > + BACKLIGHT_FIRMWARE, > +}; > + > struct backlight_device; > struct fb_info; > > @@ -62,6 +68,8 @@ struct backlight_properties { > /* FB Blanking active? (values as for power) */ > /* Due to be removed, please use (state & BL_CORE_FBBLANK) */ > int fb_blank; > + /* Backlight type */ > + enum backlight_type type; > /* Flags used to signal drivers of state changes */ > /* Upper 4 bits are reserved for driver internal use */ > unsigned int state; And if/when the half-life expires, we'll have drivers in-tree which forget to set backlight_properties.type. I haven't checked, but if we're lucky they will default to "0". What will be the runtime effects upon such unconverted drivers? Ideally we'd like them to continue to work OK, and to emit a runtime warning. In which case you'll need BACKLIGHT_RAW=1 so the unconverted driver can be detected, warned about and fixed up by the core code.
[PATCH 1/5] Backlight: Add backlight type
On Fri, 19 Nov 2010 20:25:59 + Matthew Garrett wrote: > On Fri, Nov 19, 2010 at 12:05:23PM -0800, Andrew Morton wrote: > > On Fri, 19 Nov 2010 10:53:52 -0500 > > Matthew Garrett wrote: > > > > > There may be multiple ways of controlling the backlight on a given > > > machine. > > > Allow drivers to expose the type of interface they are providing, making > > > it possible for userspace to make appropriate policy decisions. > > > > > > ... > > > > > > 60 files changed, 102 insertions(+), 0 deletions(-) > > > > This patch has a pretty short half-life. > > Well, ideally it would have landed in the backlight tree when I sent it > months ago. Then we'd have the opportunity to ensure that everything was > fixed up before it went in in the merge window. Richard got distracted. At present I'm grabbing the leds and backlight patches and Richard is reviewing them as they fly past. I don't see there's much point in me merging this patch series so if it survives review, I'd suggest that you put it into an mjg tree and thence into linux-next and mainline? > > > @@ -62,6 +68,8 @@ struct backlight_properties { > > > /* FB Blanking active? (values as for power) */ > > > /* Due to be removed, please use (state & BL_CORE_FBBLANK) */ > > > int fb_blank; > > > + /* Backlight type */ > > > + enum backlight_type type; > > > /* Flags used to signal drivers of state changes */ > > > /* Upper 4 bits are reserved for driver internal use */ > > > unsigned int state; > > > > And if/when the half-life expires, we'll have drivers in-tree which > > forget to set backlight_properties.type. I haven't checked, but if > > we're lucky they will default to "0". > > Depends entirely on whether they kzalloc the structure or not before > calling backlight_device_register(). Well. Even if it's uninitialised, the chances of the value being 1, 2 or 3 for all users are pretty small, so we'll still get to hear about it if the runtime check is appropriately implemented. > > What will be the runtime effects upon such unconverted drivers? > > Ideally we'd like them to continue to work OK, and to emit a runtime > > warning. In which case you'll need BACKLIGHT_RAW=1 so the unconverted > > driver can be detected, warned about and fixed up by the core code. > > The worst case I can think of is that we walk off the array - I guess > there's an argument for sanity checking that in backlight_show_type(). OK, well please have a think about it, see what you can do to handle unconverted (and possibly out-of-tree) drivers in a friendly fashion.
[PATCH RESEND] drm: include missing types header to drm_mode.h
On Fri, 22 Oct 2010 10:13:19 -0300 Davidlohr Bueso wrote: > drm: include missing types header to drm_mode.h > > Signed-off-by: Davidlohr Bueso > --- > include/drm/drm_mode.h |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/include/drm/drm_mode.h b/include/drm/drm_mode.h > index 0fc7397..eddd7f4 100644 > --- a/include/drm/drm_mode.h > +++ b/include/drm/drm_mode.h > @@ -24,6 +24,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #ifndef _DRM_MODE_H > #define _DRM_MODE_H > Does this fix a build error? If so, please send along the compiler error output.
regression in 2.6.35.4 'load is to heavy (video subsystem?)'
On Mon, 30 Aug 2010 10:02:36 +0200 Karsten Mehrhoff wrote: > Using the same .config from 2.6.35.3 to compile 2.5.36.4 results in a > heavy load with 2.6.35.4. A regression within -stable is rather bad. > Example: > > Difference between 2.6.35.1/2/3 and 2.6.35.4 while watching some videos: > 2.6.35.4 switches the cpu for flash videos in the browser (opera or > iceweasel) or other video outputs to 2200/2400/2600 MHz meanwhile 2.6.35.3 > (or older) stays at 1000 Mhz. That results in a higher cpu temperature, > more power consumption and so one. > > Using other GUI program results in nearly the same problems with 2.6.35.4, > so this kernel is unusable for me. > > Results to see the difference for the same action > 2.6.35.4 > Core0 Temp: +45.0__C > Core1 Temp: +43.0__C > cpu MHz: 2200.000 or higher > > 2.6.35.3 > Core0 Temp: +32.0__C > Core1 Temp: +31.0__C > cpu MHz: 1000.000 (max. 1800, but falling back to 1000) > > kernel compiled with 'CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y' > results for me in 1000, 1800, 2000, 2200, 2400, 2600 MHz. > > I'm not the only one with this problem, other users experienced the same > behavior on other systems on 386 systems, i.e. a regression for glxgears > about 30% on slower systems. We all uses differnet AMD cpus and nNida > graphic controllers. Same results for the nvidia-kernel from the repos or > the nVidia driver from nvidia.com. > > There must something be wrong in the video subsystem, which is causing > this regression. > > My system (overview using 2.5.35.3): > = > Processor:2x AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ > Memory:4060MB > > Display > Resolution: 1920x1080 pixels > OpenGL Renderer: GeForce 9500 GT/PCI/SSE2 > X11 Vendor: The X.Org Foundation > Version: 1.7.7 > > Version > Kernel: Linux 2.6.35.3-kmt (x86_64) > Compiled: SMP Mon Aug 23 00:58:37 CEST 2010 > C Library: GNU C Library version 2.11.2 (stable) > Default C Compiler: GNU C Compiler version 4.4.5 20100824 (prerelease) > (Debian 4.4.4-11) > Distribution: Debian GNU/Linux squeeze/sid > > OpenGL > Vendor:NVIDIA Corporation > Renderer: GeForce 9500 GT/PCI/SSE2 > Version: 3.3.0 NVIDIA 256.44 > Direct Rendering: Yes I'm not seeing any relevant cpufreq changes in 2.6.35.3 -> 2.6.35.4 (ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.35.4). There were a lot of DRM changes.
idr_remove called for id=0 which is not allocated
On Wed, 22 Sep 2010 23:10:10 +0200 Alessandro Guido wrote: > I have this traces in my logs (full dmesg attached). > > idr_remove called for id=0 which is not allocated. > Pid: 1136, comm: Xorg Not tainted 2.6.36-rc5-49-gc79bd89 #1 > Call Trace: > [] ? printk+0x18/0x1a > [] idr_remove+0x73/0x1c0 > [] drm_mode_object_put+0x2f/0x50 > [] drm_mode_destroy+0xe/0x20 > [] nouveau_connector_get_modes+0x2b/0x390 > [] ? acpi_lid_open+0x22/0x3c > [] ? queue_delayed_work+0x1b/0x30 > [] drm_helper_probe_single_connector_modes+0xc4/0x360 > [] drm_mode_getconnector+0x2a7/0x350 > [] drm_ioctl+0x1c2/0x4b0 > [] ? filemap_fault+0x81/0x3c0 > [] ? drm_mode_getconnector+0x0/0x350 > [] ? handle_mm_fault+0x13f/0x670 > [] ? drm_ioctl+0x0/0x4b0 > [] do_vfs_ioctl+0x7d/0x5f0 > [] ? do_page_fault+0x17c/0x3c0 > [] ? vfs_write+0xfd/0x140 > [] ? do_sync_write+0x0/0xe0 > [] sys_ioctl+0x39/0x60 > [] sysenter_do_call+0x12/0x26 I assume this is a regression. 2.6.35 didn't do this?
DRM-related kmalloc-32 memory leak in 2.6.35
On Wed, 25 Aug 2010 15:59:09 -0500 Matt Mackall wrote: > On Tue, 2010-08-24 at 10:37 -0500, Christoph Lameter wrote: > > On Tue, 24 Aug 2010, Matt Mackall wrote: > > > > > kmalloc-321113344 1113344 32 1281 : tunables00 > > > 0 : slabdata 8698 8698 0 > > > > > > That's /proc/slabinfo on my laptop with SLUB. It looks like my last > > > reboot popped me back to 2.6.33 so it may also be old news, but I > > > couldn't spot any reports with Google. > > > > Boot with "slub_debug" as a kernel parameter > > > > and then do a > > > > cat /sys/kernel/slab/kmalloc-32/alloc_calls > > > > to find the caller allocating the objets. > > Still present in 2.6.35. Appears to be DRM: > > 845 drm_vm_open_locked+0x72/0x109 age=43/37572/59269 pid=2089 > cpus=0-1 > > That's after about a minute of uptime. Grows to 100k in about a day. > > dmesg bits: > [0.834653] [drm] Initialized drm 1.1.0 20060810 > [0.834986] pci :00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 > [0.834995] pci :00:02.0: setting latency timer to 64 > [1.002572] mtrr: type mismatch for e000,1000 old: write-back new: > write-combining > [1.002580] [drm] MTRR allocation failed. Graphics performance may suffer. > [1.019880] acpi device:03: registered as cooling_device2 > [1.021520] input: Video Bus as > /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input3 > [1.021543] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) > [1.021855] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on > minor 0 > > This is with: > > 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 > Integrated Graphics Controller (rev 03) > 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 > Integrated Graphics Controller (rev 03) > Matt tells me that this (drop-dead box-killing!) bug is still present in current kernels. Could someone take a look please? The code seems very simple, and I couldn't spot a problem. Probably this means that I'm even simpler.
[PATCH] drm: Prune GEM vma entries
That was quick, thanks. On Mon, 27 Sep 2010 21:08:36 +0100 Chris Wilson wrote: > Hook the GEM vm open/close ops into the generic drm vm open/close so > that the vma entries are created and destroy appropriately. Please update the changelog to indicate that it fixes a memory leak. > Reported-by: Matt Mackall > Cc: Dave Airlie > Cc: Jesse Barnes > Signed-off-by: Chris Wilson And here please add Cc: so that earlier kernels get reliably fixed. Thanks.
[Bugme-new] [Bug 16488] New: [i915] Framebuffer ID error after suspend/hibernate leading to X crash
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 1 Aug 2010 08:55:49 GMT bugzilla-daemon at bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16488 Innocuous-looking one-liner is said to have made Milan's X server even worse than normal. >Summary: [i915] Framebuffer ID error after suspend/hibernate > leading to X crash >Product: Drivers >Version: 2.5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Video(DRI - Intel) > AssignedTo: drivers_video-dri-intel at kernel-bugs.osdl.org > ReportedBy: nalimilan at club.fr > CC: chris at chris-wilson.co.uk > Regression: Yes > > > I've been experiencing X freezes and crashes for more than a year, and with > every kernel version the cause of the bug changes. After Linus pushed > 985b823b919273fe1327d56d2196b4f92e5d0fae to 2.6.35rc6 (see below [2]), I'm now > getting an "invalid framebuffer id" error that kills my X server. Before that > commit, I was getting an oops, which was reported in bugs.fd.o as [1]. > > > /var/log/kern.log: > [ 1467.408347] PM: Finishing wakeup. > [ 1467.408350] Restarting tasks ... done. > [ 1467.434616] [drm:drm_mode_getfb] *ERROR* invalid framebuffer id > [ 1467.747233] sky2 :02:00.0: eth0: enabling interface [...] > [ 1512.204160] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... > GPU hung > [ 1512.205452] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns > -5 (awaiting 11072 at 11071) > > > At this point, the X server is killed, and won't restart: > Fatal server error: > Failed to submit batchbuffer: Input/output error > > > Excerpt from lspci -vnn: > 00:02.1 Display controller [0380]: Intel Corporation Mobile 915GM/GMS/910GML > Express Graphics Controller [8086:2792] (rev 03) > Subsystem: Toshiba America Info Systems Device [1179:ff00] > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- > SERR- Region 0: Memory at 6400 (32-bit, non-prefetchable) [disabled] > [size=512K] > Capabilities: [d0] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA > PME(D0-,D1-,D2-,D3hot-,D3cold-) > Status: D0 PME-Enable- DSel=0 DScale=0 PME- > > > 1: https://bugs.freedesktop.org/show_bug.cgi?id=26974 > 2: > commit 985b823b919273fe1327d56d2196b4f92e5d0fae > Author: Linus Torvalds > Date: Fri Jul 2 10:04:42 2010 +1000 > > drm/i915: fix hibernation since i915 self-reclaim fixes > > Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 ("drm/i915: > Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the > i915 page allocator where we weren't before due to some over-eager > removal of the page mapping gfp_flags games the code used to play. > > This caused hibernate on Intel hardware to result in a lot of memory > corruptions on resume. See for example > > http://bugzilla.kernel.org/show_bug.cgi?id=13811 > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > --- You are receiving this mail because: --- > You are on the CC list for the bug.
[PATCH 1/4] drm: fix headers to include linux/types.h
On Wed, 1 Dec 2010 17:54:18 +0100 Julien Cristau wrote: > On Wed, Dec 1, 2010 at 17:10:42 +0200, Alexander Shishkin wrote: > > > For headers that get exported to userland and make use of u32 style > > type names, it is advised to include linux/types.h. > > > > This fixes 5 headers_check warnings. > > > How many times does this need to be NAKed? Until someone gets a clue and puts comments in there explaining this?
VERY slow scrolling on radeon graphics card: debugging a timing issue?
(cc dri-devel) On Mon, 20 Dec 2010 23:58:21 +0300 Michael Tokarev wrote: > Hello. > > A weird problem here, and I'm looking for help in > an attempt to solve it. > > Ever since KMS went into kernel and I tried turning > it on, the scrolling speed on the resulting "text" > console (with kms it works in one of graphics modes > hence "text" in quotes) become really _awful_. > > For example, running `dmesg' (which has ~2000 lines) > on the console takes about 2.5 _minutes_ (!) to > complete, -- which means the speed is about 10 lines > per second. On an old notebook I have, with some also > nvidia card, the same operation completes in about > 0.8 sec. > > The lines goes up in a slow motion, I can watch every > new line appearing and scrolling. > > It was this way for a long time, and I almost gave up -- > in X everything works ok, and in order to speed up > booting again I just added "quiet" option to the kernel > command line, to avoid scrolling of kernel messages. > > But yesterday I noticed something else entirely, which > make me hope the problem actually _can_ be solved. > > The thing is: that same scrolling becomes much faster > when I "do something" else while it scrolls up. First > I noticed this when I wanted to switch to another vt > while it were scrolling -- I held down Ctrl key on my > keyboard, and out of the sudden the scroll speed up > dramatically. > > It turned out I can speed the thing to about 10 times > by generating some load: hit and hold a key on the > keyboard (generates interrupts?), run kernel compile > in the background (generates disk interrupts?), move > mouse... > > While doing "something", the same scrolling completes > in about 8 seconds instead of 2m30s. Dramatic improvement. > > Now, when I hold a key or move mouse, the scrolling > is "jumpy" - sometimes it slows down back to original > "slow" form for a bit, and sometimes it jumps a few > lines in one go. > > I tried to disable cpufreq (selecting "performance" > governor) - this changes exactly nothing. > > Next someone suggested the "perf" tool. And this one > is even more interesting: while `perf top' is running > (on another console or X), the scrolling is.. fast > again, as if I were moving my mouse! Once I stop > `perf top', it becomes slow again. So the bug > disappears while you watch it. > > And there I'm stuck again. I asked in #radeon, but > there, Alex Deucher told me that he has no clue and > that the behavour is weird (it is weird indeed). > > Any hints on where to go from there are apprecated. > > The hardware is an AMD780g-based motherboard with > and Athlon CPU, I've seen the same behavour from > many other similar boards. Kernels - all up to > the current 2.6.36.2, sine the old days when kms > for radeon first appeared in staging. > > I know kms/fbcon scrolling is slow on radeon because > it uses completely unoptimized bitblt routines (even > when the hw is pretty much capable of doing all that > stuff internally). But what I see here is something > different - the 8 sec to scroll 2000 lines is the > result from the un-optimal bitblt, not the 2m30s. > > Thanks! > > /mjt > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
Report for 2.6.35-rc3-00262-g984bc96
On Thu, 1 Jul 2010 09:40:52 +0200 Nico Schottelius wrote: > Good morning! > > A short report on what's broken in 2.6.35-rc3-00262-g984bc96 with > the Lenovo X201: So you see two post-2.6.34 regressions? > == xrandr == > > After using xrandr several times in xorg, the screen gets > "fancy": blue/white/black changing patterns. > Getting even more weired when changing to a conosle. > > The only thing that keeps on working is the mouse cursor > in xorg. This did not happen with 2.6.34-rcsomething, neither > with 2.6.33 iirc. > > This is a Intel Corporation Core Processor Integrated Graphics Controller > (rev 02), > attached xorg.log. (cc dri-devel) > == netlink/carrier messag == > > dhcpcd does not get updated when the link is established, > need to restart it. (cc netdev) > == suspend/resume (to ram) == > > So far no real crash! Only xorg is sometimes not coming up > again, but this may be related to the first issue. > > == wifi / intel 6000 == > > Works! Works! (Besides the netlink issue, which is annoying, > but for all devices).
2.6.35-rc4 Graphics performance issue and freeing invalid memtype messages on boot.
(Rafael, Maciej: two probably-separate post-2.6.34 regressions here) On Tue, 06 Jul 2010 22:22:17 +1000 Andrew Hendry wrote: > > Some extra messages when booting with -rc4. Didn't get them in -rc3. > [1.387013] swapper:1 freeing invalid memtype bf788000-bf789000 > [1.387409] swapper:1 freeing invalid memtype bf789000-bf78a000 > [5.999675] modprobe:548 freeing invalid memtype d000-d004 > [6.068347] modprobe:548 freeing invalid memtype d014-d015 > [6.068647] modprobe:548 freeing invalid memtype d015-d016 > [6.069661] modprobe:548 freeing invalid memtype d017-d01f > [6.085969] modprobe:548 freeing invalid memtype d01f-d020 > [6.087673] modprobe:548 freeing invalid memtype d021-d022 > [6.087900] modprobe:548 freeing invalid memtype d022-d023 > [6.088092] modprobe:548 freeing invalid memtype d023-d024 > [6.088317] modprobe:548 freeing invalid memtype d024-d025 hrmpf, one of those wonderful messages where neither it nor its source code give you any clue regarding what caused it nor how to fix it. please, apply this: --- a/arch/x86/mm/pat.c~a +++ a/arch/x86/mm/pat.c @@ -359,7 +359,7 @@ int free_memtype(u64 start, u64 end) entry = rbt_memtype_erase(start, end); spin_unlock(_lock); - if (!entry) { + if (WARN_ON(!entry)) { printk(KERN_INFO "%s:%d freeing invalid memtype %Lx-%Lx\n", current->comm, current->pid, start, end); return -EINVAL; and let's at least see where it's coming from. > Not sure if its related, but also have a noticeable performance issue with > graphics under rc4. > Dragging and resizing windows, screen updates, and jumpy cursor are all slow. > Playing a full screen video under vlc gets only 1-2 frames per second, but > plays fine under rc3. > Tested a kernel compile, it was roughly the same. > Userspace is ubunutu 10.04 64bit. > > Any hints? Otherwise will start a bisect tomorrow night. > > Config attached, lspci, rc3 and rc4 boot messages below: What graphics hardware are you using?
[Bug 16148] New: page allocation failure. order:1, mode:0x50d0
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 7 Jun 2010 17:32:04 GMT bugzilla-daemon at bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16148 > >Summary: page allocation failure. order:1, mode:0x50d0 >Product: Memory Management >Version: 2.5 > Kernel Version: 2.6.35-rc1 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > AssignedTo: akpm at linux-foundation.org > ReportedBy: devnull at plzk.org > Regression: No > > > Created an attachment (id=26687) > --> (https://bugzilla.kernel.org/attachment.cgi?id=26687) > dmesg > > Never seen this before. > > 2.6.35-rc1 #1 SMP Mon May 31 21:31:02 CEST 2010 x86_64 GNU/Linux > > [48126.787684] Xorg: page allocation failure. order:1, mode:0x50d0 > [48126.787691] Pid: 1895, comm: Xorg Tainted: GW 2.6.35-rc1 #1 > [48126.787694] Call Trace: > [48126.787709] [] __alloc_pages_nodemask+0x5f5/0x6f0 > [48126.787716] [] alloc_pages_current+0x95/0x100 > [48126.787720] [] new_slab+0x2ba/0x2c0 > [48126.787724] [] __slab_alloc+0x14b/0x4e0 > [48126.787730] [] ? > security_vm_enough_memory_kern+0x21/0x30 > [48126.787736] [] ? agp_alloc_page_array+0x5a/0x70 > [48126.787740] [] __kmalloc+0x11f/0x1c0 > [48126.787744] [] agp_alloc_page_array+0x5a/0x70 > [48126.787747] [] agp_generic_alloc_user+0x64/0x140 > [48126.787750] [] agp_allocate_memory+0x9a/0x140 > [48126.787755] [] drm_agp_allocate_memory+0x9/0x10 > [48126.787758] [] drm_agp_bind_pages+0x57/0x100 > [48126.787765] [] i915_gem_object_bind_to_gtt+0x144/0x340 > [48126.787768] [] i915_gem_object_pin+0xb5/0xd0 > [48126.787772] [] i915_gem_do_execbuffer+0x6cc/0x14f0 > [48126.78] [] ? __is_ram+0x0/0x10 > [48126.787783] [] ? lookup_memtype+0xce/0xe0 > [48126.787787] [] ? i915_gem_execbuffer+0x91/0x390 > [48126.787790] [] i915_gem_execbuffer+0x1d5/0x390 > [48126.787794] [] ? i915_gem_sw_finish_ioctl+0x90/0xc0 > [48126.787799] [] drm_ioctl+0x32a/0x4b0 > [48126.787802] [] ? i915_gem_execbuffer+0x0/0x390 > [48126.787807] [] vfs_ioctl+0x38/0xd0 > [48126.787810] [] do_vfs_ioctl+0x8a/0x580 > [48126.787814] [] sys_ioctl+0x81/0xa0 > [48126.787820] [] system_call_fastpath+0x16/0x1b > David, I have a vague feeling that we've been round this loop before.. Why does agp_alloc_page_array() use __GFP_NORETRY? It's pretty unusual and it's what caused this spew. There's nothing in the changelog and the only relevant commentary appears to be "This speeds things up and also saves memory for small AGP regions", which is inscrutable. Can you please add a usable comment there? Presumably this was added in response to some observed behaviour, but what was it?? If the __GFP_NORETRY is indeed useful and legitimate and given that we have a vmalloc fallback, I'd suggest that we add __GFP_NOWARN there as well to keep the bug reports away. btw, agp_memory.vmalloc_flag can be done away with - it's conventional to use is_vmalloc_addr() for this.
[Bug 16148] New: page allocation failure. order:1, mode:0x50d0
On Fri, 11 Jun 2010 10:46:07 +0200 Thomas Hellstrom wrote: > >>> > >>> > >> David, I have a vague feeling that we've been round this loop before.. > >> > >> Why does agp_alloc_page_array() use __GFP_NORETRY? It's pretty unusual > >> and it's what caused this spew. > >> > >> There's nothing in the changelog and the only relevant commentary > >> appears to be "This speeds things up and also saves memory for small > >> AGP regions", which is inscrutable. Can you please add a usable > >> comment there? > >> > > cc'ing Thomas, who added this, I expect we could drop the NORETRY or > > just add NOWARN. Though an order 1 page alloc failure isn't a pretty > > sight, not sure how a vmalloc fallback could save us. > > > > > > Hmm. IIRC that was an untested speed optimization back from the time > when I was > reading ldd3. I think the idea was to avoid slow allocations of (order > > 0) if they weren't > immediately available and fall back to vmalloc single page allocations. > It might be that that functionality is no longer preserved and only the > __GFP_NORETRY remains. > I think it should be safe to remove the NORETRY if it's annoying, but it > should probably be equally safe to add a NOWARN and keep the vmalloc > fallback. An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we use them for kernel stacks all the time. I'd say just remove the __GFP_NORETRY and be happy. In fact if the allocations are always this small I'd say we can remove the vmalloc fallback too. However if under some circumstances the allocations can be "large", say order-4 or higher then allocation failures are still a risk.
[Bug 16148] New: page allocation failure. order:1, mode:0x50d0
On Fri, 11 Jun 2010 22:22:28 +0200 Thomas Hellstrom wrote: > >>> cc'ing Thomas, who added this, I expect we could drop the NORETRY or > >>> just add NOWARN. Though an order 1 page alloc failure isn't a pretty > >>> sight, not sure how a vmalloc fallback could save us. > >>> > >>> > >>> > >> Hmm. IIRC that was an untested speed optimization back from the time > >> when I was > >> reading ldd3. I think the idea was to avoid slow allocations of (order> > >> 0) if they weren't > >> immediately available and fall back to vmalloc single page allocations. > >> It might be that that functionality is no longer preserved and only the > >> __GFP_NORETRY remains. > >> I think it should be safe to remove the NORETRY if it's annoying, but it > >> should probably be equally safe to add a NOWARN and keep the vmalloc > >> fallback. > >> > > An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we > > use them for kernel stacks all the time. I'd say just remove the > > __GFP_NORETRY and be happy. > > > > In fact if the allocations are always this small I'd say we can remove > > the vmalloc fallback too. However if under some circumstances the > > allocations can be "large", say order-4 or higher then allocation > > failures are still a risk. > > > > > > Actually, At that time I was working with a SiS GPU (128MiB system), and > was getting persistent failures for order 1 GFP_KERNEL page allocations > (albeit not in this codepath). So while they are highly unlikely for > modern systems, it might be worthwhile keeping the fallback. 128MB total RAM? Those were the days. Various page reclaim changes have been made in the past year or so which _should_ improve that (eg, lumpy reclaim) but yeah, it's by no means a certainty. The vmalloc fallback hardly hurts anyone. But it does mean that hardly anyone ever executes that codepath, so it won't get tested much. There was a patch recently which added an API formalising the alloc_pages-then-vmalloc fallback approach. It didn't get merged, although there weren't strong feelings either way really. One benefit of that approach is that the alloc/free code itself would get more testing coverage, but callers can still screw things up by failing to handle vmalloc memory correctly for DMA mapping purposes. Oh well, where were we? Remove that __GFP_NORETRY?
[Bug 16148] page allocation failure. order:1, mode:0x50d0
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). (switching back to email, actually) On Sun, 13 Jun 2010 13:01:57 GMT bugzilla-daemon at bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16148 > > > Mikko C. changed: > >What|Removed |Added > > CC||mikko.cal at gmail.com > > > > > --- Comment #8 from Mikko C. 2010-06-13 13:01:53 --- > I have been getting this with 2.6.35-rc2 and rc3. > Could it be the same problem? > > > X: page allocation failure. order:0, mode:0x4 > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1 > Call Trace: > [] ? __alloc_pages_nodemask+0x629/0x680 > [] ? __alloc_pages_nodemask+0x100/0x680 > [] ? ttm_get_pages+0x2c3/0x448 [ttm] > [] ? __ttm_tt_get_page+0x98/0xc0 [ttm] > [] ? ttm_tt_populate+0x48/0x90 [ttm] > [] ? ttm_tt_bind+0x56/0xa0 [ttm] > [] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm] > [] ? ttm_bo_move_buffer+0x166/0x180 [ttm] > [] ? drm_mm_kmalloc+0x26/0xc0 [drm] > [] ? get_parent_ip+0x9/0x20 > [] ? ttm_bo_validate+0x96/0x130 [ttm] > [] ? ttm_bo_init+0x315/0x390 [ttm] > [] ? radeon_bo_create+0x118/0x210 [radeon] > [] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon] > [] ? radeon_gem_object_create+0x8c/0x110 [radeon] > [] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon] > [] ? drm_ioctl+0x3d6/0x470 [drm] > [] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon] > [] ? do_sync_read+0xbf/0x100 > [] ? vfs_ioctl+0x35/0xd0 > [] ? do_vfs_ioctl+0x88/0x530 > [] ? sub_preempt_count+0x87/0xb0 > [] ? sys_ioctl+0x49/0x80 > [] ? sys_read+0x4e/0x90 > [] ? system_call_fastpath+0x16/0x1b That's different. ttm_get_pages() looks pretty busted to me. It's not using __GFP_WAIT and it's not using __GFP_FS. It's using a plain GFP_DMA32 so it's using atomic allocations even though it doesn't need to. IOW, it's shooting itself in the head. Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT. If so, it should very much do so. If not then the function is misdesigned and should be altered to take a gfp_t argument so the caller can tell ttm_get_pages() which is the strongest allocation mode which it may use. > [TTM] Unable to allocate page. > radeon :01:05.0: object_init failed for (7827456, 0x0002) > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456, > 2, 4096, -12) This bug actually broke stuff for you. Something like this: --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page /* No pool for cached pages */ if (pool == NULL) { if (flags & TTM_PAGE_FLAG_DMA32) - gfp_flags |= GFP_DMA32; + gfp_flags |= GFP_KERNEL|GFP_DMA32; else gfp_flags |= GFP_HIGHUSER; _ although I wonder whether it should be using pool->gfp_flags. It's a shame that this code was developed and merged in secret :( Had we known, we could have looked at enhancing mempools to cover the requirement, or at implementing this in some generic fashion rather than hiding it down in drivers/gpu/drm.
BUG: unable to handle kernel NULL pointer dereference, i915_gem_object_move_to_active
(cc dri-devel) On Wed, 5 May 2010 00:22:16 +0200 Nils Radtke wrote: > Hi, > > It happens quite often that X crashes for unknown reasons but this time > there it's left us a note, this BUG: > > BUG: unable to handle kernel NULL pointer dereference at 01e4 > IP: [] i915_gem_object_move_to_active+0x1c/0xa0 > *pde = > Oops: [#4] PREEMPT > last sysfs file: > /sys/devices/pci:00/:00:1c.1/:04:00.0/net/wlan0/wireless/link > Modules linked in: wlan_scan_sta ath_rate_sample ath_pci ath_hal option > usbserial usb_storage snd_usb_audio snd_usb_lib snd_seq_midi snd_rawmidi > uvcvideo wlan_ccmp video1394 raw1394 dv1394 firewire_ohci firewire_core wlan > ohci1394 tg3 libphy ieee1394 [last unloaded: ath_hal] > > Pid: 3328, comm: Xorg Tainted: G D2.6.33 #1 Columbia > /Extensa 5220 > EIP: 0060:[] EFLAGS: 00213282 CPU: 0 > EIP is at i915_gem_object_move_to_active+0x1c/0xa0 > EAX: EBX: c6679450 ECX: c679fd50 EDX: 00b10a13 > ESI: EDI: EBP: 00b10a13 ESP: f51c1cb0 > DS: 007b ES: 007b FS: GS: 0033 SS: 0068 > Process Xorg (pid: 3328, ti=f51c task=f6be5bf0 task.ti=f51c) > Stack: > c6679450 0004 c679fd50 c134e83f 0002 c17d42ab c16e3b29 > <0> c17d71ee 00b10a13 0004 f7395800 f7318000 f7318e10 c1326450 f7318df8 > <0> f6baff20 00b10a13 f7318de8 f7318e10 f7318000 0004 0001 f7395800 > Call Trace: > [] ? i915_add_request+0x25f/0x370 > [] ? agp_flush_chipset+0x10/0x20 > [] ? i915_gem_do_execbuffer+0x1325/0x13a0 > [] ? check_preempt_wakeup+0x87/0x160 > [] ? i915_gem_execbuffer+0xee/0x4b0 > [] ? i915_gem_execbuffer+0x23b/0x4b0 > [] ? drm_ioctl+0x186/0x380 > [] ? i915_gem_execbuffer+0x0/0x4b0 > [] ? do_sync_read+0xad/0xf0 > [] ? drm_ioctl+0x0/0x380 > [] ? vfs_ioctl+0x2b/0xa0 > [] ? do_vfs_ioctl+0x73/0x5f0 > [] ? do_setitimer+0xd9/0x220 > [] ? ktime_get_ts+0xe3/0x100 > [] ? posix_ktime_get_ts+0x0/0x10 > [] ? sys_ioctl+0x66/0x70 > [] ? sysenter_do_call+0x12/0x22 > Code: 8a 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 83 ec 10 89 c1 89 6c 24 0c > 89 d5 89 1c 24 89 74 24 04 89 7c 24 08 8b 40 08 8b 71 50 <8b> b8 e4 01 00 00 > 8b 46 20 85 c0 74 5f 89 e1 81 e1 00 e0 ff ff > EIP: [] i915_gem_object_move_to_active+0x1c/0xa0 SS:ESP > 0068:f51c1cb0 > CR2: 01e4 > ---[ end trace 13931f32db9a26c3 ]--- > > > - Is there any particular meaning for "last sysfs file" shown? I mean, could > there > be some link between the crash and "last sysfs file"? > > - X crashing happens actually a lot on this specific notebook. If it's not X > crashing, > then unmotivated kde log-offs or frequent firefox crashes contribute their > share. > It's most probably not faulty RAM, it's new, it's been tested thoroughly. > > > Thanks for some answers, > > Nils > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
BUG: unable to handle kernel NULL ptr Xorg, scheduling while atomic: Xorg/drm
(cc dri-devel) On Wed, 5 May 2010 00:51:38 +0200 Nils Radtke wrote: > Hi, > > Some more kernel NULL ptr deref, a lot of scheduling while atomic > (probably due to first bug occurred) and again this "last sysfs file" > from wireless. ath5k is a real source of trouble here: > > - resume from suspend fails 7 times out of 10 and always seems the > ath driver being the culprit. > - ath5k: when starting a blob transfer eg. ssh summer.jpg to the > notebook with the ath5k driver running, the wNIC freezes almost > immediately! Dead device. This is unfortunately too easily reproducible. > In fact, this is a blocker, wireless blob transfer via ssh (other means > not tested) is not possible w/o locking it up. > > Cheers, > > Nils > > > Four (partly lengthy) logs following, plus one more maybe-related BUG: > > Apr 7 11:43:18 localhost kernel: BUG: unable to handle kernel NULL pointer > dereference at 0001 > Apr 7 11:43:18 localhost kernel: IP: [] remove_vm_area+0x42/0x80 > Apr 7 11:43:18 localhost kernel: *pde = > Apr 7 11:43:18 localhost kernel: Oops: [#1] PREEMPT > Apr 7 11:43:18 localhost kernel: last sysfs file: > /sys/devices/pci:00/:00:1c.1/:04:00.0/net/wlan0/wireless/link > Apr 7 11:43:18 localhost kernel: Modules linked in: tg3 libphy option > usbserial usb_storage wlan_scan_sta ath_rate_sample ath_pci wlan ath_hal > [last unloaded: libphy] > Apr 7 11:43:18 localhost kernel: > Apr 7 11:43:18 localhost kernel: Pid: 3228, comm: Xorg Not tainted 2.6.33.1 > #3 Columbia /Extensa 5220 > Apr 7 11:43:18 localhost kernel: EIP: 0060:[] EFLAGS: 00213292 > CPU: 0 > Apr 7 11:43:18 localhost kernel: EIP is at remove_vm_area+0x42/0x80 > Apr 7 11:43:18 localhost kernel: EAX: f08d98c0 EBX: ef1aa4e0 ECX: 0001 > EDX: 0001 > Apr 7 11:43:18 localhost kernel: ESI: EDI: 0001 EBP: 01d1 > ESP: f55a1d08 > Apr 7 11:43:18 localhost kernel: DS: 007b ES: 007b FS: GS: 0033 SS: 0068 > Apr 7 11:43:18 localhost kernel: Process Xorg (pid: 3228, ti=f55a > task=f5590580 task.ti=f55a) > Apr 7 11:43:18 localhost kernel: Stack: > Apr 7 11:43:18 localhost kernel: 0004 fa30b000 c10c4abe f0ba42c0 > c13487b8 0007d521 001d > Apr 7 11:43:18 localhost kernel: <0> 001d c134cc92 7683 > 0001 f55a1bd8 8b421215 > Apr 7 11:43:18 localhost kernel: <0> 0028 f09f1070 > 0001 01b6b001 01b6b3cc f5110740 > Apr 7 11:43:18 localhost kernel: Call Trace: > Apr 7 11:43:18 localhost kernel: [] ? __vunmap+0x1e/0xe0 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_object_move_to_active+0x88/0xb0 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_do_execbuffer+0x832/0x12b0 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_object_set_to_gtt_domain+0x5c/0x100 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_execbuffer+0x61/0x500 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_execbuffer+0x225/0x500 > Apr 7 11:43:18 localhost kernel: [] ? > i915_gem_sw_finish_ioctl+0x78/0xa0 > Apr 7 11:43:18 localhost kernel: [] ? drm_ioctl+0x1f0/0x360 > Apr 7 11:43:18 localhost kernel: [] ? i915_gem_execbuffer+0x0/0x500 > Apr 7 11:43:18 localhost kernel: [] ? do_sync_read+0xad/0xf0 > Apr 7 11:43:18 localhost kernel: [] ? drm_ioctl+0x0/0x360 > Apr 7 11:43:18 localhost kernel: [] ? vfs_ioctl+0x2b/0xa0 > Apr 7 11:43:18 localhost kernel: [] ? do_vfs_ioctl+0x79/0x5f0 > Apr 7 11:43:18 localhost kernel: [] ? __remove_hrtimer+0x2d/0x90 > Apr 7 11:43:18 localhost kernel: [] ? do_setitimer+0xd5/0x220 > Apr 7 11:43:18 localhost kernel: [] ? sys_ioctl+0x76/0x90 > Apr 7 11:43:18 localhost kernel: [] ? sysenter_do_call+0x12/0x22 > Apr 7 11:43:18 localhost kernel: Code: 90 f6 40 08 04 74 ef 89 e2 8b 58 28 > 81 e2 00 e0 ff ff 83 42 14 01 8b 15 7c 11 9a c1 b9 7c 11 9a c1 39 d3 74 0c 8d > 74 26 00 89 d1 <8b> 12 39 d3 75 f8 8b 13 89 11 89 e2 81 e2 00 e0 ff ff 83 6a > 14 > Apr 7 11:43:18 localhost kernel: EIP: [] remove_vm_area+0x42/0x80 > SS:ESP 0068:f55a1d08 > Apr 7 11:43:18 localhost kernel: CR2: 0001 > Apr 7 11:43:18 localhost kernel: ---[ end trace 9f3adc4655f00393 ]--- > Apr 7 11:43:18 localhost kernel: note: Xorg[3228] exited with preempt_count 2 > Apr 7 11:43:18 localhost kernel: BUG: scheduling while atomic: > Xorg/3228/0x1002 > Apr 7 11:43:18 localhost kernel: Modules linked in: tg3 libphy option > usbserial usb_storage wlan_scan_sta ath_rate_sample ath_pci wlan ath_hal > [last unloaded: libphy] > Apr 7 11:43:18 localhost kernel: Pid: 3228, comm: Xorg Tainted: G D > 2.6.33.1 #3 > Apr 7 11:43:18 localhost kernel: Call Trace: > Apr 7 11:43:18 localhost kernel: [] ? schedule+0x409/0x4e0 > Apr 7 11:43:18 localhost kernel: [] ? > __rcu_process_callbacks+0x44/0x390 > Apr 7 11:43:18 localhost kernel: [] ? __cond_resched+0x13/0x30 > Apr 7 11:43:18 localhost kernel: [] ?
DRM Error on Acer Aspire One
On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson wrote: > On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput gmail.com> wrote: > > Hello, > > > > With latest git kernel, I am getting following DRM error and not > > getting XWindows : > > [snip] > > Hmm, there are still patches for capturing error state that haven't gone > upstream, shame on me. > > That error is a secondary issue to the GPU hang that is being reported. If > it is a regression caused by a kernel update it would be very useful if > you could bisect to the erroneous commit. It helps if one reads the code and the trace... i915_error_object_create() is using KM_USER0 from softirq context. That's a bug, and a pretty serious one. If some innocent civilian is writing highmem data to disk and this timer interrupt fires and trashes his KM_USER0 slot, the disk contents will be corrupted. Something like this... --- a/drivers/gpu/drm/i915/i915_irq.c~a +++ a/drivers/gpu/drm/i915/i915_irq.c @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi for (page = 0; page < page_count; page++) { void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC); + unsigned long flags; + if (d == NULL) goto unwind; - s = kmap_atomic(src_priv->pages[page], KM_USER0); + local_irq_save(flags); + s = kmap_atomic(src_priv->pages[page], KM_IRQ0); memcpy(d, s, PAGE_SIZE); - kunmap_atomic(s, KM_USER0); + kunmap_atomic(s, KM_IRQ0); + local_irq_restore(flags); dst->pages[page] = d; } dst->page_count = page_count; _ Please let's get a tested fix for this into 2.6.34.
DRM Error on Acer Aspire One
On Tue, 11 May 2010 19:19:26 +0100 Chris Wilson wrote: > On Tue, 11 May 2010 10:48:18 -0400, Andrew Morton linux-foundation.org> wrote: > > > > On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson > chris-wilson.co.uk> wrote: > > > > > On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput > > > wrote: > > > > Hello, > > > > > > > > With latest git kernel, I am getting following DRM error and not > > > > getting XWindows : > > > > > > [snip] > > > > > > Hmm, there are still patches for capturing error state that haven't gone > > > upstream, shame on me. > > > > > > That error is a secondary issue to the GPU hang that is being reported. If > > > it is a regression caused by a kernel update it would be very useful if > > > you could bisect to the erroneous commit. > > > > It helps if one reads the code and the trace... > > > > i915_error_object_create() is using KM_USER0 from softirq context. > > That's a bug, and a pretty serious one. If some innocent civilian is > > writing highmem data to disk and this timer interrupt fires and trashes > > his KM_USER0 slot, the disk contents will be corrupted. > > > > Something like this... > > > > --- a/drivers/gpu/drm/i915/i915_irq.c~a > > +++ a/drivers/gpu/drm/i915/i915_irq.c > > @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi > > > > for (page = 0; page < page_count; page++) { > > void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC); > > + unsigned long flags; > > + > > if (d == NULL) > > goto unwind; > > - s = kmap_atomic(src_priv->pages[page], KM_USER0); > > + local_irq_save(flags); > > + s = kmap_atomic(src_priv->pages[page], KM_IRQ0); > > memcpy(d, s, PAGE_SIZE); > > - kunmap_atomic(s, KM_USER0); > > + kunmap_atomic(s, KM_IRQ0); > > + local_irq_restore(flags); > > dst->pages[page] = d; > > } > > dst->page_count = page_count; > > _ > > > > Please let's get a tested fix for this into 2.6.34. > > The change that I actually want is to replace the kmap_atomic(cpu_page) with > an > io_mapping_map_atomic_wc(gtt_page), in case there is a incoherency between > the CPU and the GPU, we want to record what the GPU executed. Do you know > how if similar precautions are required with io_mapping_map_atomic_wc()? gack, wtf is io_mapping_map_atomic_wc()? Could do with some interface documentation. Looks too large to be inlined. No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context: it hardwires use of KM_USER0. I suggest that io_mapping_create_wc(), io_mapping_map_atomic_wc() etc be changed so that the caller passes in the KM_foo kmap slot index.
[PATCH] drm/i915: Record error batch buffers using iomem
On Tue, 11 May 2010 19:22:14 +0100 Chris Wilson wrote: > + reloc_offset = src_priv->gtt_offset; > for (page = 0; page < page_count; page++) { > - void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC); > + void __iomem *s; > + void *d; > + > + d = kmalloc(PAGE_SIZE, GFP_ATOMIC); > if (d == NULL) > goto unwind; > - s = kmap_atomic(src_priv->pages[page], KM_USER0); > - memcpy(d, s, PAGE_SIZE); > - kunmap_atomic(s, KM_USER0); > + > + s = io_mapping_map_atomic_wc(dev_priv->mm.gtt_mapping, > + reloc_offset); > + memcpy_fromio(d, s, PAGE_SIZE); > + io_mapping_unmap_atomic(s); As mentioned in the other email, this will still corrupt the KM_USER0 slot, and will generate a debug_kmap_atomic() warning.
DRM Error on Acer Aspire One
On Tue, 11 May 2010 19:52:31 +0100 Chris Wilson wrote: > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton linux-foundation.org> wrote: > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context: > > it hardwires use of KM_USER0. I suggest that io_mapping_create_wc(), > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in the > > KM_foo kmap slot index. > > Argh, sorry for the noise, read the mail in the wrong order. Thanks for > the review. It would be sensible to go with your simpler patch whilst > io_mapping_map_atomic_wc() is improved. OK. I'll be sending a bunch of fixes Linuswards in an hour or two. Should I include this? Subject: drivers/gpu/drm/i915/i915_irq.c:i915_error_object_create(): use correct kmap-atomic slot From: Andrew Morton <a...@linux-foundation.org> i915_error_object_create() is called from the timer interrupt and hence can corrupt the KM_USER0 slot. Use KM_IRQ0 instead. Reported-by: Jaswinder Singh Rajput Tested-by: Jaswinder Singh Rajput Cc: Chris Wilson Cc: Dave Airlie Signed-off-by: Andrew Morton --- drivers/gpu/drm/i915/i915_irq.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff -puN drivers/gpu/drm/i915/i915_irq.c~drivers-gpu-drm-i915-i915_irqc-i915_error_object_create-use-correct-kmap-atomic-slot drivers/gpu/drm/i915/i915_irq.c --- a/drivers/gpu/drm/i915/i915_irq.c~drivers-gpu-drm-i915-i915_irqc-i915_error_object_create-use-correct-kmap-atomic-slot +++ a/drivers/gpu/drm/i915/i915_irq.c @@ -461,11 +461,15 @@ i915_error_object_create(struct drm_devi for (page = 0; page < page_count; page++) { void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC); + unsigned long flags; + if (d == NULL) goto unwind; - s = kmap_atomic(src_priv->pages[page], KM_USER0); + local_irq_save(flags); + s = kmap_atomic(src_priv->pages[page], KM_IRQ0); memcpy(d, s, PAGE_SIZE); - kunmap_atomic(s, KM_USER0); + kunmap_atomic(s, KM_IRQ0); + local_irq_restore(flags); dst->pages[page] = d; } dst->page_count = page_count; _
DRM Error on Acer Aspire One
On Wed, 12 May 2010 08:22:49 +1000 Dave Airlie wrote: > On Wed, May 12, 2010 at 5:57 AM, Chris Wilson > wrote: > > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton > linux-foundation.org> wrote: > >> On Tue, 11 May 2010 19:52:31 +0100 > >> Chris Wilson wrote: > >> > >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton >> > linux-foundation.org> wrote: > >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq context: > >> > > it hardwires use of KM_USER0. __I suggest that io_mapping_create_wc(), > >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes in > >> > > the > >> > > KM_foo kmap slot index. > >> > > >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks for > >> > the review. It would be sensible to go with your simpler patch whilst > >> > io_mapping_map_atomic_wc() is improved. > >> > >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two. > >> Should I include this? > > > > Yes. > > > > Acked-by: Chris Wilson > > > > I'm not sure pushing this in at this point is a good idea, if I'm > reading it correctly we've no idea what KM_IRQ is being used for, It's used for taking kmaps from IRQ contexts. > and > this codepath is called from non-irq contexts just as much as irq > contexts. That's fine. As long as we do a local_irq_disable(), KM_IRQ0 can be used from both irq- and non-irq contexts. All we need to do is to ensure that some interrupt cannot come along on this CPU and corrupt the slot. > I'd rather we just backout the hangcheck stuff touching copies at all > at this point, and try again doing it properly with a slow work or > something for later. > > Dave.
DRM Error on Acer Aspire One
On Wed, 12 May 2010 08:51:05 +1000 Dave Airlie wrote: > On Wed, May 12, 2010 at 8:32 AM, Andrew Morton > wrote: > > On Wed, 12 May 2010 08:22:49 +1000 > > Dave Airlie wrote: > > > >> On Wed, May 12, 2010 at 5:57 AM, Chris Wilson >> chris-wilson.co.uk> wrote: > >> > On Tue, 11 May 2010 12:10:01 -0700, Andrew Morton >> > linux-foundation.org> wrote: > >> >> On Tue, 11 May 2010 19:52:31 +0100 > >> >> Chris Wilson wrote: > >> >> > >> >> > On Tue, 11 May 2010 11:35:55 -0400, Andrew Morton >> >> > linux-foundation.org> wrote: > >> >> > > No, io_mapping_map_atomic_wc() cannot be used from [soft]irq > >> >> > > context: > >> >> > > it hardwires use of KM_USER0. __I suggest that > >> >> > > io_mapping_create_wc(), > >> >> > > io_mapping_map_atomic_wc() etc be changed so that the caller passes > >> >> > > in the > >> >> > > KM_foo kmap slot index. > >> >> > > >> >> > Argh, sorry for the noise, read the mail in the wrong order. Thanks > >> >> > for > >> >> > the review. It would be sensible to go with your simpler patch whilst > >> >> > io_mapping_map_atomic_wc() is improved. > >> >> > >> >> OK. __I'll be sending a bunch of fixes Linuswards in an hour or two. > >> >> Should I include this? > >> > > >> > Yes. > >> > > >> > Acked-by: Chris Wilson > >> > > >> > >> I'm not sure pushing this in at this point is a good idea, if I'm > >> reading it correctly we've no idea what KM_IRQ is being used for, > > > > It's used for taking kmaps from IRQ contexts. > > > >> and > >> this codepath is called from non-irq contexts just as much as irq > >> contexts. > > > > That's fine. __As long as we do a local_irq_disable(), KM_IRQ0 can be > > used from both irq- and non-irq contexts. __All we need to do is to > > ensure that some interrupt cannot come along on this CPU and corrupt > > the slot. > > I don't think we do that in a lot of places, and I'd rather not add > that in to fix this problem at this point in the release cycle, as > we've no idea what it might break/regress. What is "that"? The switch to irq-protected KM_IRQ0? That won't break anything. > Its easier to just disable the hangcheck copy and try again for 2.6.35 > with a workqueue or slow work.
DRM Error on Acer Aspire One
On Wed, 12 May 2010 09:17:09 +1000 Dave Airlie wrote: > >> >> and > >> >> this codepath is called from non-irq contexts just as much as irq > >> >> contexts. > >> > > >> > That's fine. __As long as we do a local_irq_disable(), KM_IRQ0 can be > >> > used from both irq- and non-irq contexts. __All we need to do is to > >> > ensure that some interrupt cannot come along on this CPU and corrupt > >> > the slot. > >> > >> I don't think we do that in a lot of places, and I'd rather not add > >> that in to fix this problem at this point in the release cycle, as > >> we've no idea what it might break/regress. > > > > What is "that"? __The switch to irq-protected KM_IRQ0? __That won't break > > anything. > > > > disabling local cpu irqs around all these kmap mappings. > Ah. Well if there are other uses of KM_USER0 from interrupt context then yes, we have more problems. CONFIG_DEBUG_HIGHMEM && CONFIG_TRACE_IRQFLAGS_SUPPORT will detect that and as long as Jaswinder has hit all code paths in his testing, we're good. Some manual review for this would be good.
[2.6.38-rc6] G965: i915 Hangcheck timer elapsed... GPU hung (not reproducible)
(cc dri-devel) A post-2.6.37 regression. On Sun, 27 Feb 2011 10:10:41 +0100 Paolo Ornati wrote: > Today I got this while starting a video in SMplayer (MPlayer) with > 2.6.38-rc6-00113-g4662db4: > > [ 830.880014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 830.880736] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request > returns -11 (awaiting 174895 at 174857, next 174896) > [ 830.881093] [drm:init_ring_common] *ERROR* render ring initialization > failed ctl head tail start > [ 831.379079] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) > disabled interrupts, re-enabling > [ 831.399099] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) > disabled interrupts, re-enabling > ... > [ 837.392012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 837.392038] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request > returns -11 (awaiting 175016 at 174857, next 175022) > [ 837.392491] [drm:init_ring_common] *ERROR* render ring initialization > failed ctl head tail start > [ 837.537479] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) > disabled interrupts, re-enabling > [ 837.543285] [drm:i915_do_wait_request] *ERROR* something (likely vbetool) > disabled interrupts, re-enabling > ... > [ 839.040011] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer > elapsed... GPU hung > [ 839.040034] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request > returns -11 (awaiting 175022 at 174857, next 175122) > [ 839.040364] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring > wedged! > [ 839.040367] [drm:i915_reset] *ERROR* Failed to reset chip. > > Screen was almost freezed, cursor was stuck but the machine was alive > and I was able to use SysRq to kill X and try to restart it (but that > didn't help). > > I don't remember anything similar in recent kernels (<= 2.6.37) and got > this only once with 2.6.38-rcX. > > Environment at the time of the GPU crash: > KDE4 (without "Desktop Effects") > Chromium > Claws Mail > Dolphin > ccached make -j3 on a just pulled linux-tree (so I/O bound) > SMPlayer/Mplayer (just launched) > > Assorted logs attached. >
2.6.37.2 : Font corruption and [drm:i915_gem_object_unbind] *ERROR* Attempting to unbind pinned buffer
(cc dri-devel) On Mon, 28 Feb 2011 12:26:08 +0100 Paul Rolland wrote: > Hello, > > I'm using 2.6.37.2 and I'm getting loads of this error in my messages, > while at the same time I can observe font display corruption in a GTK > application (Claws-Mail) while running X, compiz and this app. > > I was previously using 2.6.36 and never experienced such a problem. > > Video is : > 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset > Integrated Graphics Controller (rev 07) > 00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset > Integrated Graphics Controller (rev 07) > > I can provide additional information as needed (such a .config or details > on userland env. if asked for precise details). > > If this message warns about a badly acting userland, then why was it ok > with 2.6.36 and the same userland env. ? > > Best, > Paul > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
+ drivers-base-platformc-dont-mark-platform_device_register_resndata-as-__init_or_module.patch added to -mm tree
On Thu, 19 May 2011 08:21:36 +0200 Uwe Kleine-K?nig wrote: > I'm not sure that the things that radeon_cp_init does are sane. Some more details here might help someone fix it. > Maybe > add a comment that it is the only known stopper to make > platform_device_register_resndata __init_or_module and a similar comment > to platform_device_register_resndata itself? Send patch :)
[PATCH] drm: avoid switching to text console if there is no panic timeout
On Thu, 10 Nov 2011 13:15:04 -0800 Mandeep Singh Baines wrote: > David Rientjes (rientjes at google.com) wrote: > > On Mon, 17 Oct 2011, David Rientjes wrote: > > > > > On Mon, 17 Oct 2011, Mandeep Singh Baines wrote: > > > > > > > From: Hugh Dickins > > > > > > > > Add a check for panic_timeout in the drm_fb_helper_panic() notifier: if > > > > we're going to reboot immediately, the user will not be able to see the > > > > messages anyway, and messing with the video mode may display artifacts, > > > > and certainly get into several layers of complexity (including mutexes > > > > and > > > > memory allocations) which we shall be much safer to avoid. > > > > > > > > Signed-off-by: Hugh Dickins > > > > [ Edited commit message and modified to short-circuit panic_timeout < 0 > > > > instead of testing panic_timeout >= 0. -Mandeep ] > > > > Signed-off-by: Mandeep Singh Baines > > > > Cc: Dave Airlie > > > > Cc: Andrew Morton > > > > Cc: dri-devel at lists.freedesktop.org > > > > > > Acked-by: David Rientjes > > > > > > > Dave, where do we stand on this? I haven't seen it hit Linus' tree and I > > don't see it in git://people.freedesktop.org/~airlied/linux. > > The last status I have is Andrew pulling it into mmotm on 10/18/11. > > Subject: + > drm-avoid-switching-to-text-console-if-there-is-no-panic-timeout.patch added > to -mm tree > From: akpm at linux-foundation.org > Date: Tue, 18 Oct 2011 15:42:46 -0700 > > > The patch titled > Subject: drm: avoid switching to text console if there is no panic > timeout > has been added to the -mm tree. Its filename is > drm-avoid-switching-to-text-console-if-there-is-no-panic-timeout.patch I need to do another round of sending patches to maintainers. It's a depressing exercise because the great majority of patches are simply ignored. Last time I even added "please don't ignore" to the email Subject: on the more important ones. Sigh. > Where is mmotm hosted these days? On my disk, until kernel.org ftp access returns. But I regularly email tarballs to Stephen, so it's all in linux-next. The mmotm tree is largely unneeded now - use linux-next to get at -mm patches.
3.2-rc2+: Reported regressions from 3.0 and 3.1
On Tue, 22 Nov 2011 11:19:24 +0530 "Srivatsa S. Bhat" wrote: > > Subject: khugepaged blocks suspend2ram (3.2.0-rc1-00252-g8f042aa) > > Submitter : Sergei Trofimovich > > Date : 2011-11-12 10:48 > > Message-ID : 2012104859.7744b282 at sf.home > > References : https://lkml.org/lkml/2011/11/12/11 > > > > Andrea's patch already fixes this issue, which was reported first by > Jiri Slaby. https://lkml.org/lkml/2011/11/11/93 > I remember Andrew Morton taking this patch in his -mm tree. But it is > not in mainline yet. So can we consider this closed or not? grr, nothing in that patch's changelog indicates that it fixed a regression nor that it fixed an infinite blockage of suspend. I moved it to my 3.2 queue, thanks.
Re: [PATCH] fix double ;;s in code
On Tue, 20 Feb 2018 10:03:56 +0200 Jani Nikulawrote: > On Mon, 19 Feb 2018, Pavel Machek wrote: > > On Mon 2018-02-19 16:41:35, Daniel Vetter wrote: > >> Yeah, pls split this into one patch per area, with a suitable patch > >> subject prefix. Look at git log of each file to get a feeling for what's > >> the standard in each area. > > > > Yeah I can spend hour spliting it, and then people will ignore it > > anyway. > > > > If you care about one of the files being modified, please fix the > > bug, ";;" is a clear bug. > > > > If you don't care ... well I don't care either. > > Look, if this causes just one conflict down the line because it touches > the kernel all over the place, then IMO it already wasn't worth > it. Merge conflicts are inevitable, but there's no reason to make life > harder just to cater for a cleanup patch. It's not that important. > > Had it been split up, the drm parts would've been merged already. I just stage patches like this behind linux-next. So if your stuff in in -next, you'll never notice a thing. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > ... > > @@ -571,7 +565,12 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > > trace_start_task_reaping(tsk->pid); > > - __oom_reap_task_mm(mm); > + /* failed to reap part of the address space. Try again later */ > + if (!__oom_reap_task_mm(mm)) { > + up_read(>mmap_sem); > + ret = false; > + goto unlock_oom; > + } This function is starting to look a bit screwy. : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(>mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, >flags)) { : up_read(>mmap_sem); : trace_skip_task_reaping(tsk->pid); : return true; : } : : trace_start_task_reaping(tsk->pid); : : /* failed to reap part of the address space. Try again later */ : if (!__oom_reap_task_mm(mm)) { : up_read(>mmap_sem); : return true; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : up_read(>mmap_sem); : : trace_finish_task_reaping(tsk->pid); : return true; : } - Undocumented return value. - comment "failed to reap part..." is misleading - sounds like it's referring to something which happened in the past, is in fact referring to something which might happen in the future. - fails to call trace_finish_task_reaping() in one case - code duplication. I'm thinking it wants to be something like this? : /* : * Return true if we successfully acquired (then released) mmap_sem : */ : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(>mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, >flags)) { : trace_skip_task_reaping(tsk->pid); : goto out; : } : : trace_start_task_reaping(tsk->pid); : : if (!__oom_reap_task_mm(mm)) { : /* Failed to reap part of the address space. Try again later */ : goto finish; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : finish: : trace_finish_task_reaping(tsk->pid); : out: : up_read(>mmap_sem); : return true; : } - Increases mmap_sem hold time a little by moving trace_finish_task_reaping() inside the locked region. So sue me ;) - Sharing the finish: path means that the trace event won't distinguish between the two sources of finishing. Please take a look? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 17 Jul 2018 10:12:01 +0200 Michal Hocko wrote: > > Any suggestions regarding how the driver developers can test this code > > path? I don't think we presently have a way to fake an oom-killing > > event? Perhaps we should add such a thing, given the problems we're > > having with that feature. > > The simplest way is to wrap an userspace code which uses these notifiers > into a memcg and set the hard limit to hit the oom. This can be done > e.g. after the test faults in all the mmu notifier managed memory and > set the hard limit to something really small. Then we are looking for a > proper process tear down. Chances are, some of the intended audience don't know how to do this and will either have to hunt down a lot of documentation or will just not test it. But we want them to test it, so a little worked step-by-step example would help things along please. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > Currently we simply back off and mark an oom victim with blockable mmu > notifiers as done after a short sleep. That can result in selecting a > new oom victim prematurely because the previous one still hasn't torn > its memory down yet. > > We can do much better though. Even if mmu notifiers use sleepable locks > there is no reason to automatically assume those locks are held. > Moreover majority of notifiers only care about a portion of the address > space and there is absolutely zero reason to fail when we are unmapping an > unrelated range. Many notifiers do really block and wait for HW which is > harder to handle and we have to bail out though. > > This patch handles the low hanging fruid. > __mmu_notifier_invalidate_range_start > gets a blockable flag and callbacks are not allowed to sleep if the > flag is set to false. This is achieved by using trylock instead of the > sleepable lock for most callbacks and continue as long as we do not > block down the call chain. I assume device driver developers are wondering "what does this mean for me". As I understand it, the only time they will see blockable==false is when their driver is being called in response to an out-of-memory condition, yes? So it is a very rare thing. Any suggestions regarding how the driver developers can test this code path? I don't think we presently have a way to fake an oom-killing event? Perhaps we should add such a thing, given the problems we're having with that feature. > I think we can improve that even further because there is a common > pattern to do a range lookup first and then do something about that. > The first part can be done without a sleeping lock in most cases AFAICS. > > The oom_reaper end then simply retries if there is at least one notifier > which couldn't make any progress in !blockable mode. A retry loop is > already implemented to wait for the mmap_sem and this is basically the > same thing. > > ... > > +static inline int mmu_notifier_invalidate_range_start_nonblock(struct > mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + int ret = 0; > + if (mm_has_notifiers(mm)) > + ret = __mmu_notifier_invalidate_range_start(mm, start, end, > false); > + > + return ret; > } nit, { if (mm_has_notifiers(mm)) return __mmu_notifier_invalidate_range_start(mm, start, end, false); return 0; } would suffice. > > ... > > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3074,7 +3074,7 @@ void exit_mmap(struct mm_struct *mm) >* reliably test it. >*/ > mutex_lock(_lock); > - __oom_reap_task_mm(mm); > + (void)__oom_reap_task_mm(mm); > mutex_unlock(_lock); What does this do? > set_bit(MMF_OOM_SKIP, >flags); > > ... > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 24 Jul 2018 16:17:47 +0200 Michal Hocko wrote: > On Fri 20-07-18 17:09:02, Andrew Morton wrote: > [...] > > - Undocumented return value. > > > > - comment "failed to reap part..." is misleading - sounds like it's > > referring to something which happened in the past, is in fact > > referring to something which might happen in the future. > > > > - fails to call trace_finish_task_reaping() in one case > > > > - code duplication. > > > > - Increases mmap_sem hold time a little by moving > > trace_finish_task_reaping() inside the locked region. So sue me ;) > > > > - Sharing the finish: path means that the trace event won't > > distinguish between the two sources of finishing. > > > > Please take a look? > > oom_reap_task_mm should return false when __oom_reap_task_mm return > false. This is what my patch did but it seems this changed by > http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-oom-remove-oom_lock-from-oom_reaper.patch > so that one should be fixed. > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 104ef4a01a55..88657e018714 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -565,7 +565,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > /* failed to reap part of the address space. Try again later */ > if (!__oom_reap_task_mm(mm)) { > up_read(>mmap_sem); > - return true; > + return false; > } > > pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, > file-rss:%lukB, shmem-rss:%lukB\n", OK, thanks, I added that. > > On top of that the proposed cleanup looks as follows: > Looks good to me. Seems a bit strange that we omit the pr_info() output if the mm was partially reaped - people would still want to know this? Not very important though. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] kernel.h: Add for_each_if()
On Mon, 9 Jul 2018 18:25:09 +0200 Daniel Vetter wrote: > To avoid compilers complainig about ambigious else blocks when putting > an if condition into a for_each macro one needs to invert the > condition and add a dummy else. We have a nice little convenience > macro for that in drm headers, let's move it out. Subsequent patches > will roll it out to other places. > > The issue the compilers complain about are nested if with an else > block and no {} to disambiguate which if the else belongs to. The C > standard is clear, but in practice people forget: > > if (foo) > if (bar) > /* something */ > else > /* something else um, yeah, don't do that. Kernel coding style is very much to do if (foo) { if (bar) /* something */ else /* something else } And if not doing that generates a warning then, well, do that. > The same can happen in a for_each macro when it also contains an if > condition at the end, except the compiler message is now really > confusing since there's only 1 if: > > for_each_something() > if (bar) > /* something */ > else > /* something else > > The for_each_if() macro, by inverting the condition and adding an > else, avoids the compiler warning. Ditto. > Motivated by a discussion with Andy and Yisheng, who want to add > another for_each_macro which would benefit from for_each_if() instead > of hand-rolling it. Ditto. > v2: Explain a bit better what this is good for, after the discussion > with Peter Z. Presumably the above was discussed in whatever-thread-that-was. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel