Re: [PATCH 2/2] xfs: disable large folio support in xfile_create
On Thu, 11 Jan 2024 18:22:50 -0800 "Darrick J. Wong" wrote: > On Thu, Jan 11, 2024 at 10:45:53PM +, Matthew Wilcox wrote: > > On Thu, Jan 11, 2024 at 02:00:53PM -0800, Andrew Morton wrote: > > > On Wed, 10 Jan 2024 12:04:51 -0800 "Darrick J. Wong" > > > wrote: > > > > > > > > > Fixing this will require a bit of an API change, and prefeably > > > > > > sorting out > > > > > > the hwpoison story for pages vs folio and where it is placed in the > > > > > > shmem > > > > > > API. For now use this one liner to disable large folios. > > > > > > > > > > > > Reported-by: Darrick J. Wong > > > > > > Signed-off-by: Christoph Hellwig > > > > > > > > > > Can someone who knows more about shmem.c than I do please review > > > > > https://lore.kernel.org/linux-xfs/20240103084126.513354-4-...@lst.de/ > > > > > so that I can feel slightly more confident as hch and I sort through > > > > > the > > > > > xfile.c issues? > > > > > > > > > > For this patch, > > > > > Reviewed-by: Darrick J. Wong > > > > > > > > ...except that I'm still getting 2M THPs even with this enabled, so I > > > > guess either we get to fix it now, or create our own private tmpfs mount > > > > so that we can pass in huge=never, similar to what i915 does. :( > > > > > > What is "this"? Are you saying that $Subject doesn't work, or that the > > > above-linked please-review patch doesn't work? > > > > shmem pays no attention to the mapping_large_folio_support() flag, > > so the proposed fix doesn't work. It ought to, but it has its own way > > of doing it that predates mapping_large_folio_support existing. > > Yep. It turned out to be easier to fix xfile.c to deal with large > folios than I thought it would be. Or so I think. We'll see what > happens on fstestscloud overnight. Where do we stand with this? Should I merge these two patches into 6.8-rcX, cc:stable?
Re: [PATCH] mm/cma: Drop cma_get_name()
On Tue, 6 Feb 2024 09:45:18 +0530 Anshuman Khandual wrote: > cma_get_name() just returns cma->name without any additional transformation > unlike other helpers such as cma_get_base() and cma_get_size(). This helper > is not worth the additional indirection, and can be dropped after replacing > directly with cma->name in the sole caller __add_cma_heap(). drivers/dma-buf/heaps/cma_heap.c: In function '__add_cma_heap': drivers/dma-buf/heaps/cma_heap.c:379:28: error: invalid use of undefined type 'struct cma' 379 | exp_info.name = cma->name; |^~ Fixing this would require moving the `struct cma' definition into cma.h. I don't think that's worthwhile.
Re: [PATCH 2/2] xfs: disable large folio support in xfile_create
On Wed, 10 Jan 2024 12:04:51 -0800 "Darrick J. Wong" wrote: > > > Fixing this will require a bit of an API change, and prefeably sorting out > > > the hwpoison story for pages vs folio and where it is placed in the shmem > > > API. For now use this one liner to disable large folios. > > > > > > Reported-by: Darrick J. Wong > > > Signed-off-by: Christoph Hellwig > > > > Can someone who knows more about shmem.c than I do please review > > https://lore.kernel.org/linux-xfs/20240103084126.513354-4-...@lst.de/ > > so that I can feel slightly more confident as hch and I sort through the > > xfile.c issues? > > > > For this patch, > > Reviewed-by: Darrick J. Wong > > ...except that I'm still getting 2M THPs even with this enabled, so I > guess either we get to fix it now, or create our own private tmpfs mount > so that we can pass in huge=never, similar to what i915 does. :( What is "this"? Are you saying that $Subject doesn't work, or that the above-linked please-review patch doesn't work?
Re: disable large folios for shmem file used by xfs xfile
On Wed, 10 Jan 2024 10:21:07 +0100 Christoph Hellwig wrote: > Hi all, > > Darrick reported that the fairly new XFS xfile code blows up when force > enabling large folio for shmem. This series fixes this quickly by disabling > large folios for this particular shmem file for now until it can be fixed > properly, which will be a lot more invasive. > Patches seems reasonable as a short-term only-affects-xfs thing. I assume that kernels which contain 137db333b29186 ("xfs: teach xfile to pass back direct-map pages to caller") want this, so a Fixes: that and a cc:stable are appropriate?
Re: [PATCH v4 1/1] drm/i915: Move abs_diff() to math.h
On Thu, 3 Aug 2023 16:19:18 +0300 Andy Shevchenko wrote: > abs_diff() belongs to math.h. Move it there. > This will allow others to use it. > > ... > > --- a/include/linux/math.h > +++ b/include/linux/math.h > @@ -155,6 +155,13 @@ __STRUCT_FRACT(u32) > __builtin_types_compatible_p(typeof(x), unsigned type), \ > ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other) > > +#define abs_diff(a, b) ({\ > + typeof(a) __a = (a);\ > + typeof(b) __b = (b);\ > + (void)(&__a == &__b); \ > + __a > __b ? (__a - __b) : (__b - __a); \ > +}) Can we document it please? Also, the open-coded type comparison could be replaced with __typecheck()? And why the heck isn't __typecheck() in typecheck.h, to be included by minmax.h. etcetera. Sigh. I'll grab it, but please at least send along some kerneldoc?
Re: [PATCH] mm: fix hugetlb page unmap count balance issue
On Wed, 7 Jun 2023 13:53:10 -0700 Mike Kravetz wrote: > > > > BUGs aren't good. Can we please find a way to push this along? > > > > Have we heard anything from any udmabuf people? > > > > I have not heard anything. When this issue popped up, it took me by surprise. > > udmabuf maintainer (Gerd Hoffmann), the people who added hugetlb support and > the list where udmabuf was developed (dri-devel@lists.freedesktop.org) have > been on cc. Maybe Greg can suggest a way forward. > My 'gut reaction' would be to remove hugetlb support from udmabuf. From a > quick look, if we really want this support then there will need to be some > API changes. For example UDMABUF_CREATE should be hugetlb page aligned > and a multiple of hugetlb page size if using a hugetlb mapping. > > It would be good to know about users of the driver. So disabling "hugetlb=on" (and adding an explanatory printk) would suffice for now?
Re: [PATCH] mm: fix hugetlb page unmap count balance issue
On Tue, 16 May 2023 15:34:40 -0700 Mike Kravetz wrote: > On 05/15/23 10:04, Mike Kravetz wrote: > > On 05/12/23 16:29, Mike Kravetz wrote: > > > On 05/12/23 14:26, James Houghton wrote: > > > > On Fri, May 12, 2023 at 12:20 AM Junxiao Chang > > > > wrote: > > > > > > > > This alone doesn't fix mapcounting for PTE-mapped HugeTLB pages. You > > > > need something like [1]. I can resend it if that's what we should be > > > > doing, but this mapcounting scheme doesn't work when the page structs > > > > have been freed. > > > > > > > > It seems like it was a mistake to include support for hugetlb memfds in > > > > udmabuf. > > > > > > IIUC, it was added with commit 16c243e99d33 udmabuf: Add support for > > > mapping > > > hugepages (v4). Looks like it was never sent to linux-mm? That is > > > unfortunate > > > as hugetlb vmemmap freeing went in at about the same time. And, as you > > > have > > > noted udmabuf will not work if hugetlb vmemmap freeing is enabled. > > > > > > Sigh! > > > > > > Trying to think of a way forward. > > > -- > > > Mike Kravetz > > > > > > > > > > > [1]: > > > > https://lore.kernel.org/linux-mm/20230306230004.1387007-2-jthough...@google.com/ > > > > > > > > - James > > > > Adding people and list on Cc: involved with commit 16c243e99d33. > > > > There are several issues with trying to map tail pages of hugetllb pages > > not taken into account with udmabuf. James spent quite a bit of time trying > > to understand and address all the issues with the HGM code. While using > > the scheme proposed by James, may be an approach to the mapcount issue there > > are also other issues that need attention. For example, I do not see how > > the fault code checks the state of the hugetlb page (such as poison) as none > > of that state is carried in tail pages. > > > > The more I think about it, the more I think udmabuf should treat hugetlb > > pages as hugetlb pages. They should be mapped at the appropriate level > > in the page table. Of course, this would impose new restrictions on the > > API (mmap and ioctl) that may break existing users. I have no idea how > > extensively udmabuf is being used with hugetlb mappings. > > Verified that using udmabug on a hugetlb mapping with vmemmap optimization > will > BUG as: BUGs aren't good. Can we please find a way to push this along? Have we heard anything from any udmabuf people?
Re: [PATCH 0/4] log2: make is_power_of_2() more generic
On Thu, 30 Mar 2023 21:53:03 + David Laight wrote: > > But wouldn't all these issues be addressed by simply doing > > > > #define is_power_of_2(n) (n != 0 && ((n & (n - 1)) == 0)) > > > > ? > > > > (With suitable tweaks to avoid evaluating `n' more than once) > > I think you need to use the 'horrid tricks' from min() to get > a constant expression from constant inputs. This --- a/include/linux/log2.h~a +++ a/include/linux/log2.h @@ -41,11 +41,11 @@ int __ilog2_u64(u64 n) * *not* considered a power of two. * Return: true if @n is a power of 2, otherwise false. */ -static inline __attribute__((const)) -bool is_power_of_2(unsigned long n) -{ - return (n != 0 && ((n & (n - 1)) == 0)); -} +#define is_power_of_2(_n) \ + ({ \ + typeof(_n) n = (_n);\ + n != 0 && ((n & (n - 1)) == 0); \ + }) /** * __roundup_pow_of_two() - round up to nearest power of two _ worked for me in a simple test. --- a/fs/open.c~b +++ a/fs/open.c @@ -1564,3 +1564,10 @@ int stream_open(struct inode *inode, str } EXPORT_SYMBOL(stream_open); + +#include + +int foo(void) +{ + return is_power_of_2(43); +} _ foo: # fs/open.c:1573: } xorl%eax, %eax # ret Is there some more tricky situation where it breaks?
Re: [PATCH 0/4] log2: make is_power_of_2() more generic
On Thu, 30 Mar 2023 13:42:39 +0300 Jani Nikula wrote: > is_power_of_2() only works for types <= sizeof(unsigned long) and it's > also not a constant expression. There are a number of places in kernel > where is_power_of_2() is called on u64, which fails on 32-bit > builds. Try to remedy that. While at it, make it a constant expression > when possible. Yes, the current `is_power_of_2(unsigned long n)' isn't very general. But wouldn't all these issues be addressed by simply doing #define is_power_of_2(n) (n != 0 && ((n & (n - 1)) == 0)) ? (With suitable tweaks to avoid evaluating `n' more than once)
Re: [PATCH 1/2] docs: process: allow Closes tags with links
On Wed, 15 Mar 2023 18:44:56 +0100 Matthieu Baerts wrote: > +Closes: https://example.com/issues/1234 I (and a few others) have been using "Addresses:" on occasion. I think "Addresses:" is a bit more general. And more humble - "I tried to fix it", not "there, I fixed it". But whatever - both are good.
Re: [PATCH 00/19] Introduce __xchg, non-atomic xchg
On Thu, 22 Dec 2022 12:46:16 +0100 Andrzej Hajda wrote: > Hi all, > > I hope there will be place for such tiny helper in kernel. > Quick cocci analyze shows there is probably few thousands places > where it could be useful. So to clarify, the intent here is a simple readability cleanup for existing open-coded exchange operations. The intent is *not* to identify existing xchg() sites which are unnecessarily atomic and to optimize them by using the non-atomic version. Have you considered the latter? > I am not sure who is good person to review/ack such patches, I can take 'em. > so I've used my intuition to construct to/cc lists, sorry for mistakes. > This is the 2nd approach of the same idea, with comments addressed[0]. > > The helper is tiny and there are advices we can leave without it, so > I want to present few arguments why it would be good to have it: > > 1. Code readability/simplification/number of lines: > > Real example from drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c: > - previous_min_rate = evport->qos.min_rate; > - evport->qos.min_rate = min_rate; > + previous_min_rate = __xchg(evport->qos.min_rate, min_rate); > > For sure the code is more compact, and IMHO more readable. > > 2. Presence of similar helpers in other somehow related languages/libs: > > a) Rust[1]: 'replace' from std::mem module, there is also 'take' > helper (__xchg(, 0)), which is the same as private helper in > i915 - fetch_and_zero, see latest patch. > b) C++ [2]: 'exchange' from utility header. > > If the idea is OK there are still 2 qestions to answer: > > 1. Name of the helper, __xchg follows kernel conventions, > but for me Rust names are also OK. I like replace(), or, shockingly, exchange(). But... Can we simply make swap() return the previous value? previous_min_rate = swap(>qos.min_rate, min_rate);
Re: [PATCH] mm/memremap: Introduce pgmap_request_folio() using pgmap offsets
On Wed, 30 Nov 2022 16:22:50 -0800 Dan Williams wrote: > I think since there is no urgent need for this series to move forward in > v6.2 I can take the time to kill the need for pfn_to_pgmap_offset() and > circle back for this in v6.3. I'll drop v3 of "Fix the DAX-gup mistake" and "mm/memremap: Introduce pgmap_request_folio() using pgmap offsets". a) because Stephen says "no next-next material in next" and b) because its presence in -next might invalidate testing of other things we have queued for the next merge window.
Re: [PATCH mm-unstable v1 16/20] mm/frame-vector: remove FOLL_FORCE usage
On Mon, 28 Nov 2022 09:18:47 +0100 David Hildenbrand wrote: > > Less chances of things going wrong that way. > > > > Just mention in the v2 cover letter that the first patch was added to > > make it easy to backport that fix without being hampered by merge > > conflicts if it was added after your frame_vector.c patch. > > Yes, that's the way I would naturally do, it, however, Andrew prefers > delta updates for minor changes. > > @Andrew, whatever you prefer! I'm inclined to let things sit as they are. Cross-tree conflicts happen, and Linus handles them. I'll flag this (very simple) conflict in the pull request, if MM merges second. If v4l merges second then hopefully they will do the same. But this one is so simple that Linus hardly needs our help. But Linus won't be editing changelogs so that the changelog makes more sense after both trees are joined. I'm inclined to let the changelog sit as it is as well.
Re: [PATCH v2 0/2] Fix a bunch of allmodconfig errors
On Fri, 25 Nov 2022 12:07:48 + Lee Jones wrote: > Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") > WERROR > now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig > > builds. This leads to some interesting failures, each resolved in this set. > Oh, I get it. Clang. I'll tweak the above para to make that clearer. cc:stable question still applies? How much trouble will these build errors be causing people for the next N years? > With this set applied, I am able to obtain a successful allmodconfig Arm > build.
Re: [PATCH v2 0/2] Fix a bunch of allmodconfig errors
On Fri, 25 Nov 2022 12:07:48 + Lee Jones wrote: > Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST") > WERROR > now defaults to COMPILE_TEST meaning that it's enabled for allmodconfig > > builds. This leads to some interesting failures, each resolved in this set. > I'm not sure who this patchset is aimed at, so I'll take my usual approach of grabbing it and seeing who complains. > With this set applied, I am able to obtain a successful allmodconfig Arm > build. b339ec9c229aa is a year old and I've been doing arm allmodconfig for ever. What am I missing here? A broken arm allmodconfig is pretty irritating - I'm thinking that a fix should be backported into -stable kernels. But I'm clearly missing something here.
Re: [PATCH v2 8/8] hmm-tests: Add test for migrate_device_range()
On Wed, 28 Sep 2022 22:01:22 +1000 Alistair Popple wrote: > @@ -1401,22 +1494,7 @@ static int dmirror_device_init(struct dmirror_device > *mdevice, int id) > > static void dmirror_device_remove(struct dmirror_device *mdevice) > { > - unsigned int i; > - > - if (mdevice->devmem_chunks) { > - for (i = 0; i < mdevice->devmem_count; i++) { > - struct dmirror_chunk *devmem = > - mdevice->devmem_chunks[i]; > - > - memunmap_pages(>pagemap); > - if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE) > - release_mem_region(devmem->pagemap.range.start, > - > range_len(>pagemap.range)); > - kfree(devmem); > - } > - kfree(mdevice->devmem_chunks); > - } > - > + dmirror_device_remove_chunks(mdevice); > cdev_del(>cdevice); > } Needed a bit or rework due to https://lkml.kernel.org/r/20220826050631.25771-1-mpent...@redhat.com. Please check my resolution. --- a/lib/test_hmm.c~hmm-tests-add-test-for-migrate_device_range +++ a/lib/test_hmm.c @@ -100,6 +100,7 @@ struct dmirror { struct dmirror_chunk { struct dev_pagemap pagemap; struct dmirror_device *mdevice; + bool remove; }; /* @@ -192,11 +193,15 @@ static int dmirror_fops_release(struct i return 0; } +static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page) +{ + return container_of(page->pgmap, struct dmirror_chunk, pagemap); +} + static struct dmirror_device *dmirror_page_to_device(struct page *page) { - return container_of(page->pgmap, struct dmirror_chunk, - pagemap)->mdevice; + return dmirror_page_to_chunk(page)->mdevice; } static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range) @@ -1218,6 +1223,85 @@ static int dmirror_snapshot(struct dmirr return ret; } +static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) +{ + unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT; + unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT; + unsigned long npages = end_pfn - start_pfn + 1; + unsigned long i; + unsigned long *src_pfns; + unsigned long *dst_pfns; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + + migrate_device_range(src_pfns, start_pfn, npages); + for (i = 0; i < npages; i++) { + struct page *dpage, *spage; + + spage = migrate_pfn_to_page(src_pfns[i]); + if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE)) + continue; + + if (WARN_ON(!is_device_private_page(spage) && + !is_device_coherent_page(spage))) + continue; + spage = BACKING_PAGE(spage); + dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL); + lock_page(dpage); + copy_highpage(dpage, spage); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + if (src_pfns[i] & MIGRATE_PFN_WRITE) + dst_pfns[i] |= MIGRATE_PFN_WRITE; + } + migrate_device_pages(src_pfns, dst_pfns, npages); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); +} + +/* Removes free pages from the free list so they can't be re-allocated */ +static void dmirror_remove_free_pages(struct dmirror_chunk *devmem) +{ + struct dmirror_device *mdevice = devmem->mdevice; + struct page *page; + + for (page = mdevice->free_pages; page; page = page->zone_device_data) + if (dmirror_page_to_chunk(page) == devmem) + mdevice->free_pages = page->zone_device_data; +} + +static void dmirror_device_remove_chunks(struct dmirror_device *mdevice) +{ + unsigned int i; + + mutex_lock(>devmem_lock); + if (mdevice->devmem_chunks) { + for (i = 0; i < mdevice->devmem_count; i++) { + struct dmirror_chunk *devmem = + mdevice->devmem_chunks[i]; + + spin_lock(>lock); + devmem->remove = true; + dmirror_remove_free_pages(devmem); + spin_unlock(>lock); + + dmirror_device_evict_chunk(devmem); + memunmap_pages(>pagemap); + if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE) + release_mem_region(devmem->pagemap.range.start, + range_len(>pagemap.range)); + kfree(devmem); + } + mdevice->devmem_count = 0; +
Re: [PATCH v9 06/14] mm/gup: migrate device coherent pages when pinning instead of failing
On Mon, 25 Jul 2022 21:22:06 -0500 "Sierra Guiza, Alejandro (Alex)" wrote: > >> a) add the || to the end of the previous line > >> b) indent such the we have a nice-to-read alignment > >> > >> if (!list_empty(_page_list) || isolation_error_count || > >> coherent_pages) > >> > > I missed that. This series is now in mm-stable so any fix will need to > > be a standalone followup patch, please. > Hi Andrew, > Just wanted to make sure nothing is missing from our side to merge this > patch series. It's queued in mm-stable and all looks good for a 5.20-rc1 merge.
Re: [PATCH v9 06/14] mm/gup: migrate device coherent pages when pinning instead of failing
On Mon, 18 Jul 2022 12:56:29 +0200 David Hildenbrand wrote: > > /* > > * Try to move out any movable page before pinning the range. > > */ > > @@ -1919,7 +1948,8 @@ static long check_and_migrate_movable_pages(unsigned > > long nr_pages, > > folio_nr_pages(folio)); > > } > > > > - if (!list_empty(_page_list) || isolation_error_count) > > + if (!list_empty(_page_list) || isolation_error_count > > + || coherent_pages) > > The common style is to > > a) add the || to the end of the previous line > b) indent such the we have a nice-to-read alignment > > if (!list_empty(_page_list) || isolation_error_count || > coherent_pages) > I missed that. This series is now in mm-stable so any fix will need to be a standalone followup patch, please. > Apart from that lgtm. > > Reviewed-by: David Hildenbrand And your reviewed-by's will be lost. Stupid git.
Re: [linux-next:master] BUILD REGRESSION 4662b7adea50bb62e993a67f611f3be625d3df0d
On Thu, 14 Jul 2022 09:56:19 +0800 kernel test robot wrote: > lib/maple_tree.c:1522:52: warning: Parameter 'gaps' can be declared with > const [constParameter] > lib/maple_tree.c:1871:21: warning: Array index 'split' is used before limits > check. [arrayIndexThenCheck] > lib/maple_tree.c:2033:55: warning: Parameter 'mas' can be declared with const > [constParameter] > lib/maple_tree.c:2426:8: warning: Redundant initialization for 'r_tmp'. The > initialized value is overwritten before it is read. [redundantInitialization] > lib/maple_tree.c:2427:8: warning: Redundant initialization for 'l_tmp'. The > initialized value is overwritten before it is read. [redundantInitialization] > lib/maple_tree.c:3160:22: warning: Found suspicious operator ',' > [constStatement] > lib/maple_tree.c:3208:11: warning: Size of pointer 'pivs' used instead of > size of its data. [pointerSize] > lib/maple_tree.c:326:2: warning: Assignment of function parameter has no > effect outside the function. Did you forget dereferencing it? > [uselessAssignmentPtrArg] > lib/maple_tree.c:4266:15: warning: The if condition is the same as the > previous if condition [duplicateCondition] > lib/maple_tree.c:4302:23: warning: Boolean result is used in bitwise > operation. Clarify expression with parentheses. [clarifyCondition] > lib/maple_tree.c:694:59: warning: Parameter 'pivots' can be declared with > const [constParameter] > lib/test_printf.c:415:11: warning: Local variable 'addr' shadows outer > function [shadowFunction] > mm/highmem.c:737:13: warning: Uninitialized variable: pam->page [uninitvar] > mm/migrate.c:355:53: warning: Parameter 'mapping' can be declared with const > [constParameter] > mm/migrate.c:875:7: warning: Redundant initialization for 'rc'. The > initialized value is overwritten before it is read. [redundantInitialization] > mm/mlock.c:230:20: warning: Using pointer that is a temporary. > [danglingTemporaryLifetime] > mm/slab.c:1635:24: warning: Uninitialized variables: slab.__page_flags, > slab.__unused_1, slab.freelist, slab.units, slab.__unused_2, > slab.__page_refcount [uninitvar] > mm/slab.c:3289:7: warning: Redundant assignment of 'objp' to itself. > [selfAssignment] > mm/slab.c:3509:8: warning: Redundant assignment of 'p[i]' to itself. > [selfAssignment] > mm/slab.c:405:9: warning: Local variable 'slab_size' shadows outer function > [shadowFunction] > mm/vmstat.c:1409:53: warning: Parameter 'pos' can be declared with const > [constParameter] > mm/vmstat.c:1650:68: warning: Parameter 'zone' can be declared with const > [constParameter] > mm/zsmalloc.c:2019:15: warning: Uninitialized variables: zspage.huge, > zspage.fullness, zspage.class, zspage.isolated, zspage.magic, zspage.inuse, > zspage.freeobj, zspage.first_page, zspage.lock [uninitvar] > mm/zsmalloc.c:2060:16: warning: Local variable 'obj_allocated' shadows outer > function [shadowFunction] urgh, thanks, lots of stuff to go through here. Liam, I suggest we worry about the mapletree things at a later time ;)
Re: [PATCH v7 01/14] mm: rename is_pinnable_pages to is_pinnable_longterm_pages
On Thu, 30 Jun 2022 00:15:06 +0200 David Hildenbrand wrote: > On 30.06.22 00:08, Felix Kuehling wrote: > > On 2022-06-29 03:33, David Hildenbrand wrote: > >> On 29.06.22 05:54, Alex Sierra wrote: > >>> is_pinnable_page() and folio_is_pinnable() were renamed to > >>> is_longterm_pinnable_page() and folio_is_longterm_pinnable() > >>> respectively. These functions are used in the FOLL_LONGTERM flag > >>> context. > >> Subject talks about "*_pages" > >> > >> > >> Can you elaborate why the move from mm.h to memremap.h is justified? > > > > Patch 2 adds is_device_coherent_page in memremap.h and updates > > is_longterm_pinnable_page to call is_device_coherent_page. memremap.h > > cannot include mm.h because it is itself included by mm.h. So the choice > > was to move is_longterm_pinnable_page to memremap.h, or move > > is_device_coherent_page and all its dependencies to mm.h. The latter > > would have been a bigger change. > > I really don't think something mm generic that compiles without > ZONE_DEVICE belongs into memremap.h. Please find a cleaner way to get > this done. (without having bothered to look at the code...) The solution is always to create a new purpose-specific header file.
Re: [PATCH v7 03/14] mm: handling Non-LRU pages returned by vm_normal_pages
On Wed, 29 Jun 2022 11:59:26 +0200 David Hildenbrand wrote: > On 29.06.22 05:54, Alex Sierra wrote: > > With DEVICE_COHERENT, we'll soon have vm_normal_pages() return > > device-managed anonymous pages that are not LRU pages. Although they > > behave like normal pages for purposes of mapping in CPU page, and for > > COW. They do not support LRU lists, NUMA migration or THP. > > > > Callers to follow_page that expect LRU pages, are also checked for > > device zone pages due to DEVICE_COHERENT type. > > Can we rephrase that to (because zeropage) > > "Callers to follow_page() currently don't expect ZONE_DEVICE pages, > however, with DEVICE_COHERENT we might now return ZONE_DEVICE. Check for > ZONE_DEVICE pages in applicable users of follow_page() as well." I made that change to my copy. > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -624,6 +624,13 @@ struct page *vm_normal_page(struct vm_area_struct > > *vma, unsigned long addr, > > if (is_zero_pfn(pfn)) > > return NULL; > > if (pte_devmap(pte)) > > +/* > > + * NOTE: New uers of ZONE_DEVICE will not set pte_devmap() and will have > > s/uers/users/ > > > + * refcounts incremented on their struct pages when they are inserted into > > + * PTEs, thus they are safe to return here. Legacy ZONE_DEVICE pages that > > set > > + * pte_devmap() do not have refcounts. Example of legacy ZONE_DEVICE is > > + * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers. > > + */ And let's regularize that comment placement? --- a/mm/memory.c~mm-handling-non-lru-pages-returned-by-vm_normal_pages-fix +++ a/mm/memory.c @@ -632,16 +632,16 @@ struct page *vm_normal_page(struct vm_ar return NULL; if (is_zero_pfn(pfn)) return NULL; + /* +* NOTE: New users of ZONE_DEVICE will not set pte_devmap() +* and will have refcounts incremented on their struct pages +* when they are inserted into PTEs, thus they are safe to +* return here. Legacy ZONE_DEVICE pages that set pte_devmap() +* do not have refcounts. Example of legacy ZONE_DEVICE is +* MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers. +*/ if (pte_devmap(pte)) -/* - * NOTE: New uers of ZONE_DEVICE will not set pte_devmap() and will have - * refcounts incremented on their struct pages when they are inserted into - * PTEs, thus they are safe to return here. Legacy ZONE_DEVICE pages that set - * pte_devmap() do not have refcounts. Example of legacy ZONE_DEVICE is - * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers. - */ return NULL; - print_bad_pte(vma, addr, pte, NULL); return NULL; } _
Re: [PATCH v7 01/14] mm: rename is_pinnable_pages to is_pinnable_longterm_pages
On Wed, 29 Jun 2022 18:08:40 -0400 Felix Kuehling wrote: > > > > I'd have called it "is_longterm_pinnable_page", but I am not a native > > speaker, so no strong opinion :) > > I think only the patch title has the name backwards. The code uses > is_longterm_pinnable_page. Patch title was quite buggy ;) I made it "mm: rename is_pinnable_page() to is_longterm_pinnable_page()" in my copy.
Re: [PATCH v5 00/13] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
On Tue, 31 May 2022 15:00:28 -0500 Alex Sierra wrote: > This is our MEMORY_DEVICE_COHERENT patch series rebased and updated > for current 5.18.0 I plan to move this series into the non-rebasing mm-stable branch in a few days. Unless sternly told not to do so!
Re: [PATCH v4 07/13] lib: test_hmm add ioctl to get zone device type
On Tue, 31 May 2022 10:56:23 -0500 Alex Sierra wrote: > new ioctl cmd added to query zone device type. This will be > used once the test_hmm adds zone device coherent type. > > @@ -1026,6 +1027,15 @@ static int dmirror_snapshot(struct dmirror *dmirror, > return ret; > } > > +static int dmirror_get_device_type(struct dmirror *dmirror, > + struct hmm_dmirror_cmd *cmd) > +{ > + mutex_lock(>mutex); > + cmd->zone_device_type = dmirror->mdevice->zone_device_type; > + mutex_unlock(>mutex); What does the locking here do? Presumably cmd->zone_device_type can become out of date the instant the mutex is released, so what was the point in taking the mutex? And does it make sense to return potentially out-of-date info to userspace? Perhaps this interface simply shouldn't exist?
Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d
On Wed, 25 May 2022 23:07:35 +0100 Jessica Clarke wrote: > This is i386, so an unsigned long is 32-bit, but i_blocks is a blkcnt_t > i.e. a u64, which makes the shift without a cast of the LHS fishy. Ah, of course, thanks. I remember 32 bits ;) --- a/mm/shmem.c~mm-shmemc-suppress-shift-warning +++ a/mm/shmem.c @@ -1945,7 +1945,7 @@ alloc_nohuge: spin_lock_irq(>lock); info->alloced += folio_nr_pages(folio); - inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio); + inode->i_blocks += (blkcnt_t)BLOCKS_PER_PAGE << folio_order(folio); shmem_recalc_inode(inode); spin_unlock_irq(>lock); alloced = true; _
Re: [linux-next:master] BUILD REGRESSION 8cb8311e95e3bb58bd84d6350365f14a718faa6d
On Thu, 26 May 2022 05:35:20 +0800 kernel test robot wrote: > tree/branch: > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > branch HEAD: 8cb8311e95e3bb58bd84d6350365f14a718faa6d Add linux-next > specific files for 20220525 > > Error/Warning reports: > > ... > > Unverified Error/Warning (likely false positive, please contact us if > interested): Could be so. > mm/shmem.c:1948 shmem_getpage_gfp() warn: should '(((1) << 12) / 512) << > folio_order(folio)' be a 64 bit type? I've been seeing this one for a while. And from this report I can't figure out what tool emitted it. Clang? > > ... > > |-- i386-randconfig-m021 > | `-- > mm-shmem.c-shmem_getpage_gfp()-warn:should-((()-)-)-folio_order(folio)-be-a-bit-type If you're going to use randconfig then shouldn't you make the config available? Or maybe quote the KCONFIG_SEED - presumably there's a way for others to regenerate. Anyway, the warning seems wrong to me. #define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT) #define BLOCKS_PER_PAGE (PAGE_SIZE/512) inode->i_blocks += BLOCKS_PER_PAGE << folio_order(folio); so the RHS here should have unsigned long type. Being able to generate the cpp output would be helpful. That requires the .config.
Re: [linux-next:master] BUILD REGRESSION 736ee37e2e8eed7fe48d0a37ee5a709514d478b3
On Wed, 18 May 2022 20:41:27 -0700 Guenter Roeck wrote: > On 5/18/22 17:55, kernel test robot wrote: > > tree/branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > branch HEAD: 736ee37e2e8eed7fe48d0a37ee5a709514d478b3 Add linux-next > > specific files for 20220518 > > > > Error/Warning reports: > > > > https://lore.kernel.org/linux-mm/202204291924.vtgzmeri-...@intel.com > > https://lore.kernel.org/linux-mm/202205041248.wgcwpcev-...@intel.com > > https://lore.kernel.org/linux-mm/202205122113.ulkzd3sz-...@intel.com > > https://lore.kernel.org/linux-mm/202205172344.3gfeaum1-...@intel.com > > https://lore.kernel.org/linux-mm/202205190527.o9wvevhi-...@intel.com > > > > Error/Warning: (recently discovered and may have been fixed) > > > [ .. ] > > drivers/hwmon/nct6775-platform.c:199:9: sparse:unsigned char > > drivers/hwmon/nct6775-platform.c:199:9: sparse:void > > This is getting tiresome. Every driver using outb() on m68k will > experience that "problem". As far as I can see, it is caused by > > #define out_8(addr,b) (void)((*(__force volatile u8 *) (unsigned long)(addr)) > = (b)) > > in arch/m68k/include/asm/raw_io.h. I have no idea what the > "(void)" is for, but removing it "fixes" the problem. > Either case, this is not a problem with the nct6775 driver, > nor is it a new problem. (cc Geert)
Re: [Bug 215807] Bad page state in process systemd-udevd
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). hm, that's my third `bad page' report in three emails. But I'm not seeing a pattern - this one might be a DRM thing. On Tue, 12 Apr 2022 20:52:18 + bugzilla-dae...@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=215807 > > paragw (parag.l...@gmail.com) changed: > >What|Removed |Added > > CC||parag.l...@gmail.com > > --- Comment #1 from paragw (parag.l...@gmail.com) --- > Same issue on different hardware - multiple occurrences of below on > 5.18.0-rc2+ > boot - > > [8.453363] amdgpu: Topology: Add CPU node > [8.467169] BUG: Bad page state in process systemd-udevd pfn:11b403 > [8.467180] fbcon: Taking over console > [8.467188] page:2cc06944 refcount:0 mapcount:0 > mapping: index:0x3 pfn:0x11b403 > [8.467195] head:aa25e58e order:9 compound_mapcount:0 > compound_pincount:0 > [8.467198] flags: 0x17c001(head|node=0|zone=2|lastcpupid=0x1f) > [8.467205] raw: 0017c000 f2da846d0001 f2da846d00c8 > > [8.467208] raw: > > [8.467210] head: 0017c001 dead0122 > > [8.467212] head: > > [8.467214] page dumped because: corrupted mapping in tail page > [8.467215] Modules linked in: amdgpu(+) rtsx_usb_sdmmc mmc_core > drm_ttm_helper ttm rtsx_usb hid_sensor_hub > iommu_v2 nvme crct10dif_pclmul crc32_pclmul gpu_sched crc32c_intel nvme_core > uas ccp drm_dp_helper usb_storag > e ghash_clmulni_intel amd_sfh ucsi_acpi sp5100_tco typec_ucsi typec > i2c_hid_acpi i2c_hid fuse ipmi_devintf ipm > i_msghandler > [8.467249] CPU: 2 PID: 416 Comm: systemd-udevd Not tainted 5.18.0-rc2+ #1 > [8.467254] Hardware name: ASUSTeK COMPUTER INC. MINIPC PN50/PN50, BIOS > 0623 > 05/13/2021 > [8.467256] Call Trace: > [8.467260] > [8.467265] dump_stack_lvl+0x48/0x5d > [8.467274] bad_page.cold+0x63/0x8f > [8.467280] free_tail_pages_check+0x144/0x180 > [8.467287] free_pcp_prepare+0x2e8/0x410 > [8.467292] free_unref_page+0x20/0x150 > [8.467298] __vunmap+0x217/0x2c0 > [8.467303] drm_fbdev_cleanup+0x5f/0xb0 > [8.467310] drm_fbdev_fb_destroy+0x19/0x30 > [8.467315] unregister_framebuffer+0x30/0x40 > [8.467321] drm_client_dev_unregister+0x6e/0xe0 > [8.467327] drm_dev_unregister+0x32/0x80 > [8.467332] drm_dev_unplug+0x25/0x40 > [8.467335] simpledrm_remove+0x15/0x20 > [8.467339] platform_remove+0x23/0x40 > [8.467344] device_release_driver_internal+0x1b3/0x210 > [8.467349] bus_remove_device+0xdc/0x150 > [8.467353] device_del+0x18f/0x3f0 > [8.467359] platform_device_del.part.0+0x13/0x70 > [8.467364] platform_device_unregister+0x20/0x30 > [8.467369] drm_aperture_detach_drivers+0xa1/0xd0 > [8.467376] drm_aperture_remove_conflicting_pci_framebuffers+0x43/0x80 > [8.467382] amdgpu_pci_probe+0x12a/0x3c0 [amdgpu] > [8.467833] local_pci_probe+0x45/0x80 > [8.467840] pci_device_probe+0xaf/0x200 > [8.467846] really_probe+0x1a1/0x370 > [8.467850] __driver_probe_device+0xfc/0x170 > [8.467854] driver_probe_device+0x1f/0x90 > [8.467858] __driver_attach+0xbf/0x1a0 > [8.467862] ? __device_attach_driver+0xe0/0xe0 > [8.467866] bus_for_each_dev+0x66/0x90 > [8.467870] bus_add_driver+0x152/0x1f0 > [8.467873] driver_register+0x8d/0xe0 > [8.467877] ? 0xc0518000 > [8.467880] do_one_initcall+0x48/0x200 > [8.467887] ? do_init_module+0x22/0x250 > [8.467892] ? kmem_cache_alloc_trace+0x165/0x2c0 > [8.467898] do_init_module+0x4a/0x250 > [8.467902] __do_sys_finit_module+0x93/0xf0 > [8.467911] do_syscall_64+0x3e/0x90 > [8.467917] entry_SYSCALL_64_after_hwframe+0x44/0xae > [8.467923] RIP: 0033:0x7fb9b13cd59d > [8.467928] Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 > f8 48 89 f7 48 89 d6 48 89 ca 4d > 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b > 78 0e 00 f7 d8 64 89 01 48 > [8.467931] RSP: 002b:7ffc90f5b558 EFLAGS: 0246 ORIG_RAX: > 0139 > [8.467936] RAX: ffda RBX: 562a33e1bcd0 RCX: > 7fb9b13cd59d > [8.467938] RDX: RSI: 7fb9b152f43c RDI: > 0011 > [8.467940] RBP: 7fb9b152f43c R08: R09: > 562a33e1bf90 > [8.467943] R10: 0011 R11: 0246 R12: > 0002 > [8.467945] R13: 562a33df8f90 R14: R15: > 562a33e1d540 > [8.467950] > > -- > You may reply to this
Re: [PATCH RFC v2] mm: Add f_ops->populate()
On Sun, 6 Mar 2022 05:26:55 +0200 Jarkko Sakkinen wrote: > Sometimes you might want to use MAP_POPULATE to ask a device driver to > initialize the device memory in some specific manner. SGX driver can use > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each > page in the address range. Why is this useful? Please fully describe the benefit to kernel users. Convince us that the benefit justifies the code churn, maintenance cost and larger kernel footprint. Do we know of any other drivers which might use this? > Add f_ops->populate() with the same parameters as f_ops->mmap() and make > it conditionally called inside call_mmap(). Update call sites > accodingly. spello > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma) > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, > bool do_populate) > { > - return file->f_op->mmap(file, vma); > + int ret = file->f_op->mmap(file, vma); > + > + if (!ret && do_populate && file->f_op->populate) > + ret = file->f_op->populate(file, vma); > + > + return ret; > } Should this still be inlined?
Re: [PATCH v4 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
On Thu, 27 Jan 2022 17:20:40 -0600 "Sierra Guiza, Alejandro (Alex)" wrote: > Andrew, > We're somehow new on this procedure. Are you referring to rebase this > patch series to > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > <5.17-rc1 tag>? No, against current Linus mainline, please.
Re: [PATCH v4 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping
On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra wrote: > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory > owned by a device that can be mapped into CPU page tables like > MEMORY_DEVICE_GENERIC and can also be migrated like > MEMORY_DEVICE_PRIVATE. Some more reviewer input appears to be desirable here. I was going to tentatively add it to -mm and -next, but problems. 5.17-rc1's mm/migrate.c:migrate_vma_check_page() is rather different from the tree you patched. Please redo, refresh and resend?
Re: [next] [dragonboard 410c] Unable to handle kernel paging request at virtual address 00000000007c4240
On Thu, 21 Oct 2021 19:51:20 +0200 Vlastimil Babka wrote: > >> Then we have to figure out how to order a fix between DRM and mmotm... > > > > That is the question! The problem exists only in the merge of the > > two. On current DRM side stack_depot_init() exists but it's __init and > > does not look safe to call multiple times. And obviously my changes > > don't exist at all in mmotm. > > > > I guess one (admittedly hackish) option is to first add a patch in > > drm-next (or drm-misc-next) that makes it safe to call > > stack_depot_init() multiple times in non-init context. It would be > > dropped in favour of your changes once the trees get merged together. > > > > Or is there some way for __drm_stack_depot_init() to detect whether it > > should call stack_depot_init() or not, i.e. whether your changes are > > there or not? > > Let's try the easiest approach first. AFAIK mmotm series is now split to > pre-next and post-next part It has been this way for many years! > and moving my patch > lib-stackdepot-allow-optional-init-and-stack_table-allocation-by-kvmalloc.patch > with the following fixup to the post-next part should solve this. Would that > work, Andrew? Thanks. For this reason. No probs, thanks. I merge up the post-linux-next parts late in the merge window. I do need to manually check that the prerequisites are in mainline, because sometimes the patches apply OK but don't make sense.
Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory
On Tue, 12 Oct 2021 15:56:29 -0300 Jason Gunthorpe wrote: > > To what other uses will this infrastructure be put? > > > > Because I must ask: if this feature is for one single computer which > > presumably has a custom kernel, why add it to mainline Linux? > > Well, it certainly isn't just "one single computer". Overall I know of > about, hmm, ~10 *datacenters* worth of installations that are using > similar technology underpinnings. > > "Frontier" is the code name for a specific installation but as the > technology is proven out there will be many copies made of that same > approach. > > The previous program "Summit" was done with NVIDIA GPUs and PowerPC > CPUs and also included a very similar capability. I think this is a > good sign that this coherently attached accelerator will continue to > be a theme in computing going foward. IIRC this was done using out of > tree kernel patches and NUMA localities. > > Specifically with CXL now being standardized and on a path to ubiquity > I think we will see an explosion in deployments of coherently attached > accelerator memory. This is the high end trickling down to wider > usage. > > I strongly think many CXL accelerators are going to want to manage > their on-accelerator memory in this way as it makes universal sense to > want to carefully manage memory access locality to optimize for > performance. Thanks. Can we please get something like the above into the [0/n] changelog? Along with any other high-level info which is relevant? It's rather important. "why should I review this", "why should we merge this", etc.
Re: [PATCH v1 00/12] MEMORY_DEVICE_COHERENT for CPU-accessible coherent device memory
On Tue, 12 Oct 2021 12:12:35 -0500 Alex Sierra wrote: > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory > owned by a device that can be mapped into CPU page tables like > MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE. > With MEMORY_DEVICE_COHERENT, we isolate the new memory type from other > subsystems as far as possible, though there are some small changes to > other subsystems such as filesystem DAX, to handle the new memory type > appropriately. > > We use ZONE_DEVICE for this instead of NUMA so that the amdgpu > allocator can manage it without conflicting with core mm for non-unified > memory use cases. > > How it works: The system BIOS advertises the GPU device memory (aka VRAM) > as SPM (special purpose memory) in the UEFI system address map. > The amdgpu driver registers the memory with devmap as > MEMORY_DEVICE_COHERENT using devm_memremap_pages. > > The initial user for this hardware page migration capability will be > the Frontier supercomputer project. To what other uses will this infrastructure be put? Because I must ask: if this feature is for one single computer which presumably has a custom kernel, why add it to mainline Linux? > Our nodes in the lab have .5 TB of > system memory plus 256 GB of device memory split across 4 GPUs, all in > the same coherent address space. Page migration is expected to improve > application efficiency significantly. We will report empirical results > as they become available. > > This includes patches originally by Ralph Campbell to change ZONE_DEVICE > reference counting as requested in previous reviews of this patch series > (see https://patchwork.freedesktop.org/series/90706/). We extended > hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This patch set > builds on HMM and our SVM memory manager already merged in 5.14. > We would like to complete review and merge this migration patchset for > 5.16.
Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
On Fri, 20 Aug 2021 14:05:27 +0200 "Christian König" wrote: > While unplugging a device the TTM shrinker implementation > needs a barrier to make sure that all concurrent shrink > operations are done and no other CPU is referring to a > device specific pool any more. > > Taking and releasing the shrinker semaphore on the write > side after unmapping and freeing all pages from the device > pool should make sure that no shrinker is running in > paralell. > > This allows us to avoid the contented mutex in the TTM pool > implementation for every alloc/free operation. > > v2: rework the commit message to make clear why we need this Acked-by: Andrew Morton
Re: [PATCH v11 00/10] Add support for SVM atomics in Nouveau
On Wed, 16 Jun 2021 20:59:27 +1000 Alistair Popple wrote: > This is my series to add support for SVM atomics in Nouveau Can we please have a nice [0/n] overview for this patchset?
Re: [PATCH v9 07/10] mm: Device exclusive memory access
On Mon, 24 May 2021 23:27:22 +1000 Alistair Popple wrote: > Some devices require exclusive write access to shared virtual > memory (SVM) ranges to perform atomic operations on that memory. This > requires CPU page tables to be updated to deny access whilst atomic > operations are occurring. > > In order to do this introduce a new swap entry > type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for > exclusive access by a device all page table mappings for the particular > range are replaced with device exclusive swap entries. This causes any > CPU access to the page to result in a fault. > > Faults are resovled by replacing the faulting entry with the original > mapping. This results in MMU notifiers being called which a driver uses > to update access permissions such as revoking atomic access. After > notifiers have been called the device will no longer have exclusive > access to the region. > > Walking of the page tables to find the target pages is handled by > get_user_pages() rather than a direct page table walk. A direct page > table walk similar to what migrate_vma_collect()/unmap() does could also > have been utilised. However this resulted in more code similar in > functionality to what get_user_pages() provides as page faulting is > required to make the PTEs present and to break COW. > > ... > > Documentation/vm/hmm.rst | 17 > include/linux/mmu_notifier.h | 6 ++ > include/linux/rmap.h | 4 + > include/linux/swap.h | 7 +- > include/linux/swapops.h | 44 - > mm/hmm.c | 5 + > mm/memory.c | 128 +++- > mm/mprotect.c| 8 ++ > mm/page_vma_mapped.c | 9 +- > mm/rmap.c| 186 +++ > 10 files changed, 405 insertions(+), 9 deletions(-) > This is quite a lot of code added to core MM for a single driver. Is there any expectation that other drivers will use this code? Is there a way of reducing the impact (code size, at least) for systems which don't need this code? How beneficial is this code to nouveau users? I see that it permits a part of OpenCL to be implemented, but how useful/important is this in the real world? Thanks.
Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" wrote: > @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool) > for (j = 0; j < MAX_ORDER; ++j) > ttm_pool_type_fini(>caching[i].orders[j]); > } > + > + /* We removed the pool types from the LRU, but we need to also make sure > + * that no shrinker is concurrently freeing pages from the pool. > + */ > + sync_shrinkers(); It isn't immediately clear to me how this works. ttm_pool_fini() has already freed all the pages hasn't it? So why would it care if some shrinkers are still playing with the pages? Or is it the case that ttm_pool_fini() is assuming that there will be some further action against these pages, which requires that shrinkers no longer be accessing the pages and which further assumes that future shrinker invocations will not be able to look up these pages? IOW, a bit more explanation about the dynamics here would help! ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Fw: [Bug 211707] New: BUG: unable to handle page fault for address: ffffa147bdf6b91d
Looks like a recent regression? Begin forwarded message: Date: Thu, 11 Feb 2021 14:27:43 + From: bugzilla-dae...@bugzilla.kernel.org To: a...@linux-foundation.org Subject: [Bug 211707] New: BUG: unable to handle page fault for address: a147bdf6b91d https://bugzilla.kernel.org/show_bug.cgi?id=211707 Bug ID: 211707 Summary: BUG: unable to handle page fault for address: a147bdf6b91d Product: Memory Management Version: 2.5 Kernel Version: 5.11-rc7 Hardware: x86-64 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Page Allocator Assignee: a...@linux-foundation.org Reporter: erhar...@mailbox.org CC: mic...@daenzer.net Regression: No Created attachment 295231 --> https://bugzilla.kernel.org/attachment.cgi?id=295231=edit kernel dmesg (5.11-rc7, eMachines E620) [...] BUG: unable to handle page fault for address: a147bdf6b91d #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 52001067 P4D 52001067 PUD 0 Oops: [#1] SMP NOPTI CPU: 1 PID: 623 Comm: udevd Not tainted 5.11.0-rc7-E620 #2 Hardware name: eMachineseMachines E620 /Nile , BIOS V1.03 09/30/2008 RIP: 0010:get_freepointer+0x6/0x1a Code: 89 f0 48 89 d6 48 89 ca 48 83 ef 70 49 8b 48 30 48 c7 c0 fb ff ff ff 48 85 c9 74 07 e8 f2 47 87 00 48 98 c3 8b 47 28 48 01 c6 <48> 8b 06 48 89 f2 48 33 87 b0 00 00 00 48 0f ca 48 31 d0 c3 48 8b RSP: 0018:ac5c807cb898 EFLAGS: 00010286 RAX: 0030 RBX: a14707082078 RCX: 022a RDX: a14702180980 RSI: a147bdf6b91d RDI: a14700042a00 RBP: 0dc0 R08: a147bdf6b8ed R09: 0229 R10: R11: a147070822d0 R12: a14700042a00 R13: a14700042a00 R14: 0050 R15: c08b1d88 FS: 7ff6aaf13b68() GS:a1472df0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: a147bdf6b91d CR3: 000104156000 CR4: 06e0 Call Trace: kmem_cache_alloc_trace+0x97/0x157 radeon_ttm_tt_create+0x21/0x66 [radeon] ttm_tt_create+0x8d/0xa3 [ttm] ttm_bo_handle_move_mem+0x85/0xf9 [ttm] ? radeon_update_memory_usage+0x4b/0x4b [radeon] ttm_bo_validate+0x138/0x1aa [ttm] ttm_bo_init_reserved+0x282/0x2ba [ttm] ttm_bo_init+0x56/0x8f [ttm] ? radeon_update_memory_usage+0x4b/0x4b [radeon] radeon_bo_create+0x17d/0x255 [radeon] ? radeon_update_memory_usage+0x4b/0x4b [radeon] radeon_ring_init+0x3e/0x12d [radeon] r100_cp_init+0x1ea/0x469 [radeon] rs690_startup+0x114/0x196 [radeon] ? radeon_pm_init+0x607/0x626 [radeon] rs690_init+0x29f/0x2eb [radeon] radeon_device_init+0x899/0xa7d [radeon] radeon_driver_load_kms+0x83/0x15c [radeon] drm_dev_register+0xf4/0x1bc [drm] radeon_pci_probe+0x134/0x178 [radeon] pci_device_probe+0x95/0x104 really_probe+0x144/0x326 driver_probe_device+0x63/0x92 device_driver_attach+0x37/0x50 __driver_attach+0x8d/0x93 ? device_driver_attach+0x50/0x50 bus_for_each_dev+0x71/0xa7 bus_add_driver+0x106/0x1b7 driver_register+0x99/0xd2 ? 0xc09da000 do_one_initcall+0xe2/0x24a ? trace_kmalloc+0x9a/0xc7 ? kmem_cache_alloc_trace+0x130/0x157 do_init_module+0x56/0x1f5 __do_sys_finit_module+0x94/0xbb do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7ff6aaea39d8 Code: 4c 89 4c 24 48 4c 8b 4c 24 60 48 89 54 24 10 48 8b 7e 08 48 89 74 24 18 48 8b 51 18 c7 44 24 08 08 00 00 00 48 8b 76 10 0f 05 <48> 89 c7 e8 40 08 fe ff 48 83 c4 58 c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7fd2e470 EFLAGS: 0202 ORIG_RAX: 0139 RAX: ffda RBX: 7ff6aae1c9d0 RCX: 7ff6aaea39d8 RDX: RSI: 7ff6aae168e7 RDI: 001b RBP: 0002 R08: R09: 7ff6aaebb6dd R10: 001b R11: 0202 R12: R13: 7ff6aa487d30 R14: 7ff6aae168e7 R15: Modules linked in: radeon(+) snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mac80211 snd_hda_intel evdev snd_intel_dspcfg input_leds snd_hda_codec led_class ath snd_hwdep psmouse snd_hda_core i2c_algo_bit lm83 drm_ttm_helper snd_pcm ohci_pci ttm snd_timer k8temp drm_kms_helper adm1021 hwmon snd ohci_hcd ehci_pci cfbfillrect soundcore syscopyarea cfbimgblt cfg80211 ehci_hcd sysfillrect i2c_piix4 sysimgblt fb_sys_fops sr_mod usbcore cfbcopyarea usb_common cdrom drm video drm_panel_orientation_quirks rfkill fb battery libarc4 ac font fbdev backlight thermal button processor CR2: a147bdf6b91d ---[ end trace 1a80b76ad6066c81 ]--- Some data about the machine: # inxi -b System:Kernel: 5.11.0-rc7-E620 x86_64 bits: 64 Desktop: MATE 1.24.0 Distro: Gentoo Base System release 2.7 Machine: Type: Laptop System: eMachines product: eMachines E620 v: V1.03 serial:
Fw: [Bug 211587] New: X: page allocation failure: order:8, mode:0x190dc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_ZERO|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0
I'm not sure who this belongs to... Begin forwarded message: Date: Sat, 06 Feb 2021 01:49:51 + From: bugzilla-dae...@bugzilla.kernel.org To: a...@linux-foundation.org Subject: [Bug 211587] New: X: page allocation failure: order:8, mode:0x190dc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_ZERO|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0 https://bugzilla.kernel.org/show_bug.cgi?id=211587 Bug ID: 211587 Summary: X: page allocation failure: order:8, mode:0x190dc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_ZERO|__ GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0 Product: Memory Management Version: 2.5 Kernel Version: v5.11-rc6 Hardware: i386 OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Page Allocator Assignee: a...@linux-foundation.org Reporter: erhar...@mailbox.org Regression: No Created attachment 295083 --> https://bugzilla.kernel.org/attachment.cgi?id=295083=edit dmesg (kernel 5.11-rc6, Shuttle XPC FS51, Pentium 4) ... 1st occurence Got this while running btrfs testsuite on the machine. It's not 100 % reproducable, but it happened on several occasions. [...] X: page allocation failure: order:8, mode:0x190dc2(GFP_HIGHUSER|__GFP_NORETRY|__GFP_ZERO|__GFP_NOMEMALLOC), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 199 Comm: X Not tainted 5.11.0-rc6-Pentium4 #4 Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 Call Trace: dump_stack+0x6b/0x89 warn_alloc+0x87/0xe0 __alloc_pages_nodemask+0x9ba/0xa6e ? lock_acquire+0x86/0x2b4 ? __kmalloc+0x11a/0x12d ? kvmalloc_node+0x43/0x6b ttm_pool_alloc+0x135/0x3cd [ttm] radeon_ttm_tt_populate+0x72/0x79 [radeon] ? radeon_bo_reserve.constprop.0+0x4c/0x4c [radeon] ttm_tt_populate+0x2e/0x92 [ttm] ttm_bo_handle_move_mem+0x8d/0xd9 [ttm] ttm_bo_validate+0xe7/0x140 [ttm] ttm_bo_init_reserved+0x23e/0x26b [ttm] ttm_bo_init+0x48/0x67 [ttm] ? radeon_bo_gpu_offset+0x55/0x55 [radeon] radeon_bo_create+0x118/0x1ce [radeon] ? radeon_bo_gpu_offset+0x55/0x55 [radeon] radeon_gem_object_create+0x93/0x12e [radeon] radeon_gem_create_ioctl+0x4a/0xa9 [radeon] ? srcu_read_unlock.constprop.0+0x28/0x2b [drm] ? radeon_gem_pwrite_ioctl+0x19/0x19 [radeon] drm_ioctl_kernel+0x73/0xa9 [drm] drm_ioctl+0x225/0x2f7 [drm] ? radeon_gem_pwrite_ioctl+0x19/0x19 [radeon] ? lock_acquire+0x86/0x2b4 ? lock_acquire+0x86/0x2b4 ? lock_release+0x78/0x201 ? __pm_runtime_resume+0x5e/0x66 ? trace_hardirqs_on+0x43/0x45 ? __pm_runtime_resume+0x5e/0x66 radeon_drm_ioctl+0x3d/0x69 [radeon] ? radeon_pmops_runtime_idle+0x83/0x83 [radeon] vfs_ioctl+0x1a/0x24 __ia32_sys_ioctl+0x618/0x632 ? __vm_munmap+0x60/0x88 ? __vm_munmap+0x7e/0x88 ? exit_to_user_mode_prepare+0x16b/0x173 __do_fast_syscall_32+0x66/0x76 do_fast_syscall_32+0x29/0x5b do_SYSENTER_32+0x15/0x17 entry_SYSENTER_32+0x9f/0xf2 EIP: 0xb7f76545 Code: c4 01 10 03 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 EAX: ffda EBX: 000d ECX: c01c645d EDX: bff1a860 ESI: 0232da80 EDI: 0002 EBP: bff1a818 ESP: bff1a7e8 DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 0286 Mem-Info: active_anon:229 inactive_anon:66945 isolated_anon:0 active_file:44149 inactive_file:364595 isolated_file:32 unevictable:0 dirty:25771 writeback:2106 slab_reclaimable:4183 slab_unreclaimable:6442 mapped:30206 shmem:1738 pagetables:941 bounce:0 free:23348 free_pcp:72 free_cma:0 Node 0 active_anon:916kB inactive_anon:267780kB active_file:176596kB inactive_file:1458380kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:120824kB dirty:103084kB writeback:8424kB shmem:6952kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2456kB pagetables:3764kB all_unreclaimable? no DMA free:8776kB min:788kB low:984kB high:1180kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:7136kB unevictable:0kB writepending:188kB present:16000kB managed:15916kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB lowmem_reserve[]: 0 834 1998 1998 Normal free:79860kB min:42468kB low:53084kB high:63700kB reserved_highatomic:0KB active_anon:0kB inactive_anon:60668kB active_file:58240kB inactive_file:608540kB unevictable:0kB writepending:12988kB present:24kB managed:855428kB mlocked:0kB bounce:0kB free_pcp:36kB local_pcp:36kB free_cma:0kB lowmem_reserve[]: 0 0 9311 9311 HighMem free:4756kB min:4608kB low:19408kB high:34208kB reserved_highatomic:0KB active_anon:916kB inactive_anon:207112kB active_file:118332kB inactive_file:842684kB unevictable:0kB writepending:98340kB present:1191880kB managed:1191880kB mlocked:0kB bounce:0kB free_pcp:252kB local_pcp:252kB free_cma:0kB lowmem_reserve[]: 0 0 0
Re: [PATCH 1/2] mm: mmap: fix fput in error path v2
On Wed, 18 Nov 2020 11:57:44 +0100 Christian König wrote: > Am 06.11.20 um 23:48 schrieb Andrew Morton: > > On Fri, 6 Nov 2020 12:48:05 +0100 "Christian König" > > wrote: > > > >> Patch "495c10cc1c0c CHROMIUM: dma-buf: restore args..." > >> adds a workaround for a bug in mmap_region. > >> > >> As the comment states ->mmap() callback can change > >> vma->vm_file and so we might call fput() on the wrong file. > >> > >> Revert the workaround and proper fix this in mmap_region. > >> > > Seems correct, best I can tell. Presumably all ->mmap() instances will > > correctly fput() to original file* if they're rewriting vma->vm_file. > > Yes, exactly. > > Patch #2 provides a helper to make sure that everybody gets the > get_file()/fput() correctly while updating vma->vm_file. > > Can I add your acked-by to the patches and push them upstream through > drm-misc-next? Please go ahead. Acked-by: Andrew Morton ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/2] mm: mmap: fix fput in error path v2
On Fri, 6 Nov 2020 12:48:05 +0100 "Christian König" wrote: > Patch "495c10cc1c0c CHROMIUM: dma-buf: restore args..." > adds a workaround for a bug in mmap_region. > > As the comment states ->mmap() callback can change > vma->vm_file and so we might call fput() on the wrong file. > > Revert the workaround and proper fix this in mmap_region. > Seems correct, best I can tell. Presumably all ->mmap() instances will correctly fput() to original file* if they're rewriting vma->vm_file. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/6] mm: mmap: fix fput in error path
On Fri, 9 Oct 2020 17:03:37 +0200 "Christian König" wrote: > Patch "495c10cc1c0c CHROMIUM: dma-buf: restore args..." > adds a workaround for a bug in mmap_region. > > As the comment states ->mmap() callback can change > vma->vm_file and so we might call fput() on the wrong file. > > Revert the workaround and proper fix this in mmap_region. > Doesn't this patch series address the same thing as https://lkml.kernel.org/r/20200916090733.31427-1-linmia...@huawei.com? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/2] mm/frame-vec: use FOLL_LONGTERM
On Mon, 5 Oct 2020 14:47:47 -0300 Jason Gunthorpe wrote: > Andrew please let me know if you need a resend Andrew is rather confused. Can we please identify the minimal patch(es) which are needed for 5.9 and -stable? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: remove alloc_vm_area v2
On Thu, 24 Sep 2020 15:58:42 +0200 Christoph Hellwig wrote: > this series removes alloc_vm_area, which was left over from the big > vmalloc interface rework. It is a rather arkane interface, basicaly > the equivalent of get_vm_area + actually faulting in all PTEs in > the allocated area. It was originally addeds for Xen (which isn't > modular to start with), and then grew users in zsmalloc and i915 > which seems to mostly qualify as abuses of the interface, especially > for i915 as a random driver should not set up PTE bits directly. > > Note that the i915 patches apply to the drm-tip branch of the drm-tip > tree, as that tree has recent conflicting commits in the same area. Is the drm-tip material in linux-next yet? I'm still seeing a non-trivial reject in there at present. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 00/23] device-dax: Support sub-dividing soft-reserved ranges
On Wed, 19 Aug 2020 18:53:57 -0700 Dan Williams wrote: > > I think I am missing some important pieces. Bear with me. > > No worries, also bear with me, I'm going to be offline intermittently > until at least mid-September. Hopefully Joao and/or Vishal can jump in > on this discussion. Ordinarily I'd prefer a refresh for 2+ week-old series such as this. But given that v4 all applies OK and that Dan has pending outages, I'll scoop up this version, even though at least one change has been suggested. Also, this series has killed Zhen Lei's little cleanup (http://lkml.kernel.org/r/20200817065926.2239-1-thunder.leiz...@huawei.com). I don't think the affected code was moved elsewhere, so I'll drop that patch. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 15/23] device-dax: Add resize support
On Sun, 02 Aug 2020 22:03:46 -0700 Dan Williams wrote: > Make the device-dax 'size' attribute writable to allow capacity to be > split between multiple instances in a region. The intended consumers of > this capability are users that want to split a scarce memory resource > between device-dax and System-RAM access, or users that want to have > multiple security domains for a large region. > > By default the hmem instance provider allocates an entire region to the > first instance. The process of creating a new instance (assuming a > region-id of 0) is find the region and trigger the 'create' attribute > which yields an empty instance to configure. For example: > > cd /sys/bus/dax/devices > echo dax0.0 > dax0.0/driver/unbind > echo $new_size > dax0.0/size > echo 1 > $(readlink -f dax0.0)../dax_region/create > seed=$(cat $(readlink -f dax0.0)../dax_region/seed) > echo $new_size > $seed/size > echo dax0.0 > ../drivers/{device_dax,kmem}/bind > echo dax0.1 > ../drivers/{device_dax,kmem}/bind > > Instances can be destroyed by: > > echo $device > $(readlink -f $device)../dax_region/delete This userspace interface doesn't seem to be documented anywhere, so there's nothing to update for this patch :( ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] io-mapping: Indicate mapping failure
On Tue, 21 Jul 2020 21:02:44 + "Ruhl, Michael J" wrote: > >--- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix > >+++ a/include/linux/io-mapping.h > >@@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io > >resource_size_t base, > >unsigned long size) > > { > >+iomap->iomem = ioremap_wc(base, size); > >+if (!iomap->iomem) > >+return NULL; > >+ > > This does make more sense. > > I am confused by the two follow up emails I just got. One was your original patch, the other is my suggested alteration. > Shall I resubmit, or is this path (if !iomap->iomem) return NULL) > now in the tree. All is OK. If my alteration is acceptable (and, preferably, tested!) then when the time comes, I'll fold it into the base patch, add a note indicating this change and shall then send it to Linus. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v2] io-mapping: Indicate mapping failure
On Tue, 21 Jul 2020 13:19:36 -0400 "Michael J. Ruhl" wrote: > The !ATOMIC_IOMAP version of io_maping_init_wc will always return > success, even when the ioremap fails. > > Since the ATOMIC_IOMAP version returns NULL when the init fails, and > callers check for a NULL return on error this is unexpected. > > During a device probe, where the ioremap failed, a crash can look > like this: > > BUG: unable to handle page fault for address: 0021 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > Oops: 0002 [#1] PREEMPT SMP > CPU: 0 PID: 177 Comm: > RIP: 0010:fill_page_dma [i915] > gen8_ppgtt_create [i915] > i915_ppgtt_create [i915] > intel_gt_init [i915] > i915_gem_init [i915] > i915_driver_probe [i915] > pci_device_probe > really_probe > driver_probe_device > > The remap failure occurred much earlier in the probe. If it had > been propagated, the driver would have exited with an error. > > Return NULL on ioremap failure. > > ... > > --- a/include/linux/io-mapping.h > +++ b/include/linux/io-mapping.h > @@ -118,7 +118,7 @@ io_mapping_init_wc(struct io_mapping *iomap, > iomap->prot = pgprot_noncached(PAGE_KERNEL); > #endif > > - return iomap; > + return iomap->iomem ? iomap : NULL; > } > > static inline void LGTM. However I do think it would be stylistically better/typical to detect and handle the error early, rather than to blunder on, pointlessly initializing things? --- a/include/linux/io-mapping.h~io-mapping-indicate-mapping-failure-fix +++ a/include/linux/io-mapping.h @@ -107,9 +107,12 @@ io_mapping_init_wc(struct io_mapping *io resource_size_t base, unsigned long size) { + iomap->iomem = ioremap_wc(base, size); + if (!iomap->iomem) + return NULL; + iomap->base = base; iomap->size = size; - iomap->iomem = ioremap_wc(base, size); #if defined(pgprot_noncached_wc) /* archs can't agree on a name ... */ iomap->prot = pgprot_noncached_wc(PAGE_KERNEL); #elif defined(pgprot_writecombine) @@ -118,7 +121,7 @@ io_mapping_init_wc(struct io_mapping *io iomap->prot = pgprot_noncached(PAGE_KERNEL); #endif - return iomap->iomem ? iomap : NULL; + return iomap; } static inline void _ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] arch/{mips,sparc,microblaze,powerpc}: Don't enable pagefault/preempt twice
On Thu, 21 May 2020 10:42:50 -0700 Ira Weiny wrote: > > > > > > Actually it occurs to me that the patch consolidating kmap_prot is odd for > > > sparc 32 bit... > > > > > > Its a long shot but could you try reverting this patch? > > > > > > 4ea7d2419e3f kmap: consolidate kmap_prot definitions > > > > > > > That is not easy to revert, unfortunately, due to several follow-up patches. > > I have gotten your sparc tests to run and they all pass... > > 08:10:34 > ../linux-build-test/rootfs/sparc/run-qemu-sparc.sh > Build reference: v5.7-rc4-17-g852b6f2edc0f > > Building sparc32:SPARCClassic:nosmp:scsi:hd ... running . passed > Building sparc32:SPARCbook:nosmp:scsi:cd ... running . passed > Building sparc32:LX:nosmp:noapc:scsi:hd ... running . passed > Building sparc32:SS-4:nosmp:initrd ... running . passed > Building sparc32:SS-5:nosmp:scsi:hd ... running . passed > Building sparc32:SS-10:nosmp:scsi:cd ... running . passed > Building sparc32:SS-20:nosmp:scsi:hd ... running . passed > Building sparc32:SS-600MP:nosmp:scsi:hd ... running . passed > Building sparc32:Voyager:nosmp:noapc:scsi:hd ... running . passed > Building sparc32:SS-4:smp:scsi:hd ... running . passed > Building sparc32:SS-5:smp:scsi:cd ... running . passed > Building sparc32:SS-10:smp:scsi:hd ... running . passed > Building sparc32:SS-20:smp:scsi:hd ... running . passed > Building sparc32:SS-600MP:smp:scsi:hd ... running . passed > Building sparc32:Voyager:smp:noapc:scsi:hd ... running . passed > > Is there another test I need to run? This all petered out, but as I understand it, this patchset still might have issues on various architectures. Can folks please provide an update on the testing status? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH hmm v2 1/5] mm/hmm: make CONFIG_DEVICE_PRIVATE into a select
On Fri, 1 May 2020 15:20:44 -0300 Jason Gunthorpe wrote: > From: Jason Gunthorpe > > There is no reason for a user to select this or not directly - it should > be selected by drivers that are going to use the feature, similar to how > CONFIG_HMM_MIRROR works. > > Currently all drivers provide a feature kconfig that will disable use of > DEVICE_PRIVATE in that driver, allowing users to avoid enabling this if > they don't want the overhead. > I'm not too sure what's going on here, but i386 allmodconfig broke. kernel/resource.c: In function '__request_free_mem_region': kernel/resource.c:1653:28: error: 'PA_SECTION_SHIFT' undeclared (first use in this function); did you mean 'SECTIONS_PGSHIFT'? size = ALIGN(size, 1UL << PA_SECTION_SHIFT); because in current mainline, allmodconfig produces CONFIG_DEVICE_PRIVATE=n but in current linux-next, allmodconfig produces CONFIG_DEVICE_PRIVATE=y. But CONFIG_SPARSEMEM=n so the build breaks. Bisection fingers this commit, but reverting it doesn't seem to fix things. Could you take a look please? I'm seeing this from menuconfig: WARNING: unmet direct dependencies detected for DEVICE_PRIVATE Depends on [n]: ZONE_DEVICE [=n] Selected by [m]: - DRM_NOUVEAU_SVM [=y] && HAS_IOMEM [=y] && DRM_NOUVEAU [=m] && MMU [=y] && STAGING [=y] - TEST_HMM [=m] && RUNTIME_TESTING_MENU [=y] && TRANSPARENT_HUGEPAGE [=y] `select' rather sucks this way - easy to break dependencies. Quite a number of years ago the Kconfig gurus were saying "avoid", but I don't recall the details. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH V3 15/15] kmap: Consolidate kmap_prot definitions
On Thu, 7 May 2020 08:00:03 -0700 ira.we...@intel.com wrote: > From: Ira Weiny > > Most architectures define kmap_prot to be PAGE_KERNEL. > > Let sparc and xtensa define there own and define PAGE_KERNEL as the > default if not overridden. > checkpatch considered useful ;) From: Andrew Morton Subject: kmap-consolidate-kmap_prot-definitions-checkpatch-fixes WARNING: macros should not use a trailing semicolon #134: FILE: arch/sparc/include/asm/highmem.h:33: +#define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE); total: 0 errors, 1 warnings, 100 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/kmap-consolidate-kmap_prot-definitions.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Ira Weiny Signed-off-by: Andrew Morton --- arch/sparc/include/asm/highmem.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/sparc/include/asm/highmem.h~kmap-consolidate-kmap_prot-definitions-checkpatch-fixes +++ a/arch/sparc/include/asm/highmem.h @@ -30,7 +30,7 @@ /* declarations for highmem.c */ extern unsigned long highstart_pfn, highend_pfn; -#define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE); +#define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE) extern pte_t *pkmap_page_table; void kmap_init(void) __init; _ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH V3 13/15] parisc/kmap: Remove duplicate kmap code
On Thu, 7 May 2020 08:00:01 -0700 ira.we...@intel.com wrote: > parisc reimplements the kmap calls except to flush it's dcache. This is > arguably an abuse of kmap but regardless it is messy and confusing. > > Remove the duplicate code and have parisc define > ARCH_HAS_FLUSH_ON_KUNMAP for a kunmap_flush_on_unmap() architecture > specific call to flush the cache. checkpatch says: ERROR: #define of 'ARCH_HAS_FLUSH_ON_KUNMAP' is wrong - use Kconfig variables or standard guards instead #69: FILE: arch/parisc/include/asm/cacheflush.h:103: +#define ARCH_HAS_FLUSH_ON_KUNMAP which is fair enough, I guess. More conventional would be arch/parisc/include/asm/cacheflush.h: static inline void kunmap_flush_on_unmap(void *addr) { ... } #define kunmap_flush_on_unmap kunmap_flush_on_unmap include/linux/highmem.h: #ifndef kunmap_flush_on_unmap static inline void kunmap_flush_on_unmap(void *addr) { } #define kunmap_flush_on_unmap kunmap_flush_on_unmap #endif static inline void kunmap_atomic_high(void *addr) { /* Mostly nothing to do in the CONFIG_HIGHMEM=n case as kunmap_atomic() * handles re-enabling faults + preemption */ kunmap_flush_on_unmap(addr); } but I don't really think it's worth bothering changing it. (Ditto patch 3/15) ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc
On Thu, 30 Apr 2020 22:38:10 -0400 John Dorminy wrote: > the change > description refers to PROT_KERNEL, which is a symbol which does not > appear to exist; perhaps PAGE_KERNEL was meant? Yes, thanks, fixed. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] get_maintainer: Add email addresses from .yaml files
On Sun, 26 Apr 2020 23:33:02 -0700 Joe Perches wrote: > On Mon, 2020-04-27 at 07:57 +0200, Sam Ravnborg wrote: > > Hi Joe. > > Hi Sam. > > > On Sun, Apr 26, 2020 at 10:40:52PM -0700, Joe Perches wrote: > > > .yaml files can contain maintainer/author addresses and it seems > > > unlikely or unnecessary that individual MAINTAINER file section > > > entries for each .yaml file will be created. > > > > > > So dd the email addresses found in .yaml files to the default > > ^ > > add > > Andrew, can you add the a to this please? > > > Signed-off-by: Joe Perches > > Acked-by: Sam Ravnborg > > Tested-by: Sam Ravnborg > > > > The patch did not apply on top of -rc3, but it was trivial to fix. > > Tested and works like a charm. > > Thanks for doing this! > > As most of my patches, it was done using -next > The patch assumes that we have - if ($file_emails) { - my @poss_addr = $text =~ m$[A-Za-z_-_\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; - push(@file_emails, clean_file_emails(@poss_addr)); - } but today's next has if ($file_emails) { my @poss_addr = $text =~ m$[A-Za-zÀ-ÿ\"\' \,\.\+-]*\s*[\,]*\s*[\(\<\{]{0,1}[A-Za-z0-9_\.\+-]+\@[A-Za-z0-9\.-]+\.[A-Za-z0-9]+[\)\>\}]{0,1}$g; push(@file_emails, clean_file_emails(@poss_addr)); } so what do do here? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 206697] #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-present page while building a large project
On Mon, 2 Mar 2020 15:03:29 -0800 Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 02 Mar 2020 21:55:10 + bugzilla-dae...@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > > > --- Comment #2 from Erhard F. (erhar...@mailbox.org) --- > > Created attachment 287765 > > --> https://bugzilla.kernel.org/attachment.cgi?id=287765=edit > > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) > > > > Same on kernel 5.6-rc4: > > Thanks. This looks like a regression in the DRM code. I've added > suitable Ccs. Erhard, please let's handle this issue via email, not via the bugzilla interface. This does appear to be a DRM issue, and it has been reproduced in 5.7-rc1. Latest update below: From: bugzilla-dae...@bugzilla.kernel.org To: a...@linux-foundation.org Subject: [Bug 206697] #PF: supervisor read access in kernel mode, #PF: error_code(0x) - not-present page while building a large project Date: Fri, 17 Apr 2020 07:45:23 + https://bugzilla.kernel.org/show_bug.cgi?id=206697 --- Comment #5 from Erhard F. (erhar...@mailbox.org) --- Looks mostly the same on kernel 5.7-rc1. The line after kthread+0xd1/0xd3 is different. It was "? try_to_free_pages+0x3d4/0x3d4" on 5.5.6 and 5.6-rc4, but is "? shrink_node+0x6f2/0x6f2" on 5.7-rc1 now. [...] Apr 17 00:28:40 BUG: kernel NULL pointer dereference, address: Apr 17 00:28:40 #PF: supervisor read access in kernel mode Apr 17 00:28:40 #PF: error_code(0x) - not-present page Apr 17 00:28:40 *pde = Apr 17 00:28:40 Oops: [#1] SMP Apr 17 00:28:40 CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.7.0-rc1-Pentium4 #1 Apr 17 00:28:40 Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: EBX: ECX: 0001 EDX: Apr 17 00:28:40 ESI: 0001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: CR3: 05ab3000 CR4: 06d0 Apr 17 00:28:40 Call Trace: Apr 17 00:28:40 ? _raw_spin_lock+0x22/0x2a Apr 17 00:28:40 ? lookup_address+0x1d/0x20 Apr 17 00:28:40 __change_page_attr_set_clr+0x85/0x551 Apr 17 00:28:40 ? __mutex_unlock_slowpath+0x20/0x1b6 Apr 17 00:28:40 ? mutex_unlock+0xb/0xd Apr 17 00:28:40 ? _vm_unmap_aliases.part.0+0x11f/0x127 Apr 17 00:28:40 change_page_attr_set_clr+0xdc/0x1af Apr 17 00:28:40 set_pages_array_wb+0x20/0x7b Apr 17 00:28:40 ttm_pages_put+0x22/0x71 [ttm] Apr 17 00:28:40 ttm_page_pool_free+0xa1/0x111 [ttm] Apr 17 00:28:40 ttm_pool_shrink_scan+0x9c/0xd1 [ttm] Apr 17 00:28:40 shrink_slab.constprop.0+0x248/0x38f Apr 17 00:28:40 shrink_node+0x533/0x6f2 Apr 17 00:28:40 kswapd+0x4b6/0x628 Apr 17 00:28:40 ? kswapd+0x4b6/0x628 Apr 17 00:28:40 kthread+0xd1/0xd3 Apr 17 00:28:40 ? shrink_node+0x6f2/0x6f2 Apr 17 00:28:40 ? kthread_delayed_work_timer_fn+0x6a/0x6a Apr 17 00:28:40 ret_from_fork+0x2e/0x38 Apr 17 00:28:40 Modules linked in: fuse auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc rt2500pci eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon hwmon i2c_algo_bit d> Apr 17 00:28:40 CR2: Apr 17 00:28:40 ---[ end trace 49fbdfbb6e459a06 ]--- Apr 17 00:28:40 EIP: __cpa_process_fault+0x205/0x226 Apr 17 00:28:40 Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 30 53 68 d5 c7 cb d8 e8 cb 68 00 00 0f 0b> Apr 17 00:28:40 EAX: EBX: ECX: 0001 EDX: Apr 17 00:28:40 ESI: 0001 EDI: f5e5bd4c EBP: f5e5bcc4 ESP: f5e5bca0 Apr 17 00:28:40 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 Apr 17 00:28:40 CR0: 80050033 CR2: CR3: 05ab3000 CR4: 06d0 -- You are receiving this mail because: You are the assignee for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Ack to merge through DRM? WAS [PATCH v6 0/9] Huge page-table entries for TTM
On Thu, 19 Mar 2020 11:20:44 +0100 Thomas Hellström (VMware) wrote: > On 3/19/20 12:27 AM, Andrew Morton wrote: > > On Mon, 16 Mar 2020 13:32:08 +0100 Thomas Hellström (VMware) > > wrote: > > > >>> ___ > >>> dri-devel mailing list > >>> dri-devel@lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> Andrew, would it be possible to have an ack for merge using a DRM tree > >> for the -mm patches? > > Yes, please do. It's all pretty straightforward addition of new > > functionality which won't affect existing code. > > Thanks Andrew. Can I add your Acked-by: To the mm patches for Linus' > reference? > Please do. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Ack to merge through DRM? WAS [PATCH v6 0/9] Huge page-table entries for TTM
On Mon, 16 Mar 2020 13:32:08 +0100 Thomas Hellström (VMware) wrote: > > ___ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > Andrew, would it be possible to have an ack for merge using a DRM tree > for the -mm patches? Yes, please do. It's all pretty straightforward addition of new functionality which won't affect existing code. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 206697] #PF: supervisor read access in kernel mode, #PF: error_code(0x0000) - not-present page while building a large project
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 02 Mar 2020 21:55:10 + bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=206697 > > --- Comment #2 from Erhard F. (erhar...@mailbox.org) --- > Created attachment 287765 > --> https://bugzilla.kernel.org/attachment.cgi?id=287765=edit > dmesg (kernel 5.6-rc4, Shuttle XPC FS51, Pentium 4) > > Same on kernel 5.6-rc4: Thanks. This looks like a regression in the DRM code. I've added suitable Ccs. > [...] > [ 908.356444] BUG: kernel NULL pointer dereference, address: > [ 908.356670] #PF: supervisor read access in kernel mode > [ 908.356823] #PF: error_code(0x) - not-present page > [ 908.356974] *pde = > [ 908.357064] Oops: [#1] SMP > [ 908.357163] CPU: 0 PID: 53 Comm: kswapd0 Not tainted 5.6.0-rc4-Pentium4 #1 > [ 908.357367] Hardware name: /FS51, BIOS 6.00 PG 12/02/2003 > [ 908.357535] EIP: __cpa_process_fault+0x205/0x226 > [ 908.357674] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 > 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 30 > 53 > 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff > [ 908.358228] EAX: EBX: ECX: 0001 EDX: > [ 908.358411] ESI: 0001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 > [ 908.358598] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 > [ 908.358798] CR0: 80050033 CR2: CR3: 00333000 CR4: 06d0 > [ 908.358981] Call Trace: > [ 908.359062] ? _raw_spin_lock+0x22/0x2a > [ 908.359176] ? lookup_address+0x1d/0x20 > [ 908.359289] __change_page_attr_set_clr+0x85/0x551 > [ 908.359436] ? __mutex_unlock_slowpath+0x20/0x1b6 > [ 908.368244] ? mutex_unlock+0xb/0xd > [ 908.377037] ? _vm_unmap_aliases.part.0+0x11f/0x127 > [ 908.385944] change_page_attr_set_clr+0xdc/0x1af > [ 908.394889] set_pages_array_wb+0x20/0x7b > [ 908.403630] ttm_pages_put+0x22/0x71 [ttm] > [ 908.412159] ttm_page_pool_free+0xf6/0x111 [ttm] > [ 908.420492] ttm_pool_shrink_scan+0x9c/0xd1 [ttm] > [ 908.428885] shrink_slab.constprop.0+0x248/0x38f > [ 908.437241] shrink_node+0x533/0x6f2 > [ 908.445492] kswapd+0x4b9/0x62d > [ 908.453714] ? kswapd+0x4b9/0x62d > [ 908.461937] kthread+0xd1/0xd3 > [ 908.470055] ? try_to_free_pages+0x3d4/0x3d4 > [ 908.478143] ? kthread_delayed_work_timer_fn+0x6a/0x6a > [ 908.486242] ret_from_fork+0x2e/0x38 > [ 908.494256] Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs lockd > grace sunrpc ctr aes_generic libaes ccm hid_generic usbhid hid rt2500pci > eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib led_class mac80211 radeon evdev > hwmon i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfg80211 cfbimgblt > sysfillrect sysimgblt fb_sys_fops cfbcopyarea firewire_ohci firewire_core fb > rfkill font crc_itu_t 8139too libarc4 mii fbdev sr_mod cdrom ttm thermal fan > ohci_pci drm 8250 snd_intel8x0 8250_base serial_core snd_ac97_codec ac97_bus > ehci_pci ohci_hcd snd_pcm drm_panel_orientation_quirks ehci_hcd button > backlight snd_timer usbcore sis_agp agpgart snd i2c_sis96x usb_common > processor > soundcore zstd zram zsmalloc > [ 908.522107] CR2: > [ 908.531646] ---[ end trace f8cc5b63e4c76d19 ]--- > [ 908.541190] EIP: __cpa_process_fault+0x205/0x226 > [ 908.550708] Code: 2d 00 00 00 40 39 d0 76 1f 81 fa ff ff ff bf 76 17 c7 47 > 10 01 00 00 00 81 c3 00 00 00 40 c1 eb 0c 89 5f 18 31 f6 eb 19 8b 07 30 > 53 > 68 56 ba 89 c9 e8 f8 68 00 00 0f 0b 83 c4 0c be f2 ff ff > [ 908.560958] EAX: EBX: ECX: 0001 EDX: > [ 908.571156] ESI: 0001 EDI: f5e6fd4c EBP: f5e6fcc4 ESP: f5e6fca0 > [ 908.581412] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010213 > [ 908.591710] CR0: 80050033 CR2: CR3: 00333000 CR4: 06d0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 1/8] mm: Introduce vma_is_special_huge
On Tue, 3 Dec 2019 14:22:32 +0100 Thomas Hellström (VMware) wrote: > From: Thomas Hellstrom > > For VM_PFNMAP and VM_MIXEDMAP vmas that want to support transhuge pages > and -page table entries, introduce vma_is_special_huge() that takes the > same codepaths as vma_is_dax(). > > The use of "special" follows the definition in memory.c, vm_normal_page(): > "Special" mappings do not wish to be associated with a "struct page" > (either it doesn't exist, or it exists but they don't want to touch it) > > For PAGE_SIZE pages, "special" is determined per page table entry to be > able to deal with COW pages. But since we don't have huge COW pages, > we can classify a vma as either "special huge" or "normal huge". > > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2822,6 +2822,12 @@ extern long copy_huge_page_from_user(struct page > *dst_page, > const void __user *usr_src, > unsigned int pages_per_huge_page, > bool allow_pagefault); > +static inline bool vma_is_special_huge(struct vm_area_struct *vma) > +{ > + return vma_is_dax(vma) || (vma->vm_file && > +(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))); > +} Some documetnation would be nice. Not only what it does, but why it does it. ie, what is the *meaning* of vma_is_spacial_huge(vma)==true? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 0/8] Huge page-table entries for TTM
On Tue, 3 Dec 2019 14:22:31 +0100 Thomas Hellström (VMware) wrote: > In order to save TLB space and CPU usage this patchset enables huge- and giant > page-table entries for TTM and TTM-enabled graphics drivers. Have these savings been measured? They shouild be, please. And documented in this changelog! ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 2/8] mm: Split huge pages on write-notify or COW
On Tue, 3 Dec 2019 14:22:33 +0100 Thomas Hellström (VMware) wrote: > We currently only do COW and write-notify on the PTE level, so if the > huge_fault() handler returns VM_FAULT_FALLBACK on wp faults, > split the huge pages and page-table entries. Also do this for huge PUDs > if there is no huge_fault() handler and the vma is not anonymous, similar > to how it's done for PMDs. There is no description here of *why* we are making this change? > Note that fs/dax.c does the splitting in the huge_fault() handler, but as > huge_fault() is implemented by modules we need to consider whether to > export the splitting functions for use in the modules or whether to try > to keep calls in the core. Opt for the latter. A follow-up patch can > remove the dax.c split_huge_pmd() if needed. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 0/9] Huge page-table entries for TTM
On Fri, 28 Feb 2020 14:08:04 +0100 Thomas Hellström (VMware) wrote: > I'm wondering what's the best way here to get the patches touching mm > reviewed and accepted? > While drm people and VMware internal people have looked at them, I think > the huge_fault() fallback splitting and the introduction of > vma_is_special_huge() needs looking at more thoroughly. > > Apart from that, if possible, I think the best way to merge this series > is also through a DRM tree. Patches 1-3 look OK to me. I just had a few commenting/changelogging niggles. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] dma-buf: free dmabuf->name in dma_buf_release()
On Thu, 27 Feb 2020 13:38:03 -0800 Cong Wang wrote: > On Tue, Feb 25, 2020 at 5:54 PM Andrew Morton > wrote: > > > > On Tue, 25 Feb 2020 12:44:46 -0800 Cong Wang > > wrote: > > > > > dma-buff name can be set via DMA_BUF_SET_NAME ioctl, but once set > > > it never gets freed. > > > > > > Free it in dma_buf_release(). > > > > > > ... > > > > > > --- a/drivers/dma-buf/dma-buf.c > > > +++ b/drivers/dma-buf/dma-buf.c > > > @@ -108,6 +108,7 @@ static int dma_buf_release(struct inode *inode, > > > struct file *file) > > > dma_resv_fini(dmabuf->resv); > > > > > > module_put(dmabuf->owner); > > > + kfree(dmabuf->name); > > > kfree(dmabuf); > > > return 0; > > > } > > > > ow. Is that ioctl privileged? > > It looks unprivileged to me, as I don't see capable() called along > the path. > OK, thanks. I added cc:stable to my copy. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: + dma-buf-free-dmabuf-name-in-dma_buf_release.patch added to -mm tree
On Wed, 26 Feb 2020 10:36:26 +0100 Daniel Vetter wrote: > On Wed, Feb 26, 2020 at 5:29 AM Sumit Semwal wrote: > > > > Hello Andrew, > > > > > > On Wed, 26 Feb 2020 at 07:25, Andrew Morton > > wrote: > > > > > > > > > The patch titled > > > Subject: dma-buf: free dmabuf->name in dma_buf_release() > > > has been added to the -mm tree. Its filename is > > > dma-buf-free-dmabuf-name-in-dma_buf_release.patch > > > > Thanks for taking this patch via -mm during my absence (I'm just > > returning from a bit of an illness). If there are other dma-buf > > patches on your radar that you'd like to take via the mm tree, please > > let me know and I can provide the necessary Acks. > > Else I will take them in via drm-misc as usual. > > I thought at least that for cases like these -mm is the last resort > tree, so proper thing to do here is apply this fix to drm-misc-fixes > and get it out there. -mm rebases, so will fall out again. Yup, go ahead. If a patch pops up in linux-next I'll autodrop by copy. And please do give some thought to whether this should be cc:stable. If it's an unprivileged operation then hellyeah. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] dma-buf: free dmabuf->name in dma_buf_release()
On Tue, 25 Feb 2020 12:44:46 -0800 Cong Wang wrote: > dma-buff name can be set via DMA_BUF_SET_NAME ioctl, but once set > it never gets freed. > > Free it in dma_buf_release(). > > ... > > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -108,6 +108,7 @@ static int dma_buf_release(struct inode *inode, struct > file *file) > dma_resv_fini(dmabuf->resv); > > module_put(dmabuf->owner); > + kfree(dmabuf->name); > kfree(dmabuf); > return 0; > } ow. Is that ioctl privileged? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 01/52] mm/sl[uo]b: export __kmalloc_track(_node)_caller
On Wed, 19 Feb 2020 11:20:31 +0100 Daniel Vetter wrote: > tracker in drm for stuff that's tied to the lifetime of a drm_device, > not the underlying struct device. Kinda like devres, but for drm. > > ... > > Ack for merging through drm trees very much appreciated. Acked-by: Andrew Morton ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v12 00/22] mm/gup: prereqs to track dma-pinned pages: FOLL_PIN
On Tue, 14 Jan 2020 12:15:08 -0800 John Hubbard wrote: > > > > Hi Andrew and all, > > > > To clarify: I'm hoping that this series can go into 5.6. > > > > Meanwhile, I'm working on tracking down and solving the problem that Leon > > reported, in the "track FOLL_PIN pages" patch, and that patch is not part of > > this series. > > > > Hi Andrew and all, > > Any thoughts on this? 5.6 is late. But it was in -mm before (briefly) and appears to be mature and well-reviewed. I'll toss it in there and shall push it into -next hopefully today. Let's decide 2-3 weeks hence. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Ack to merge through DRM tree? WAS [PATCH v4 0/2] mm, drm/ttm: Fix pte insertion with customized protection
On Fri, 20 Dec 2019 09:06:08 +0100 Thomas Hellström (VMware) wrote: > Andrew, > > On 12/12/19 9:47 AM, Thomas Hellström (VMware) wrote: > > From: Thomas Hellstrom > > > > The drm/ttm module is using a modified on-stack copy of the > > struct vm_area_struct to be able to set a page protection with customized > > caching. Fix that by adding a vmf_insert_mixed_prot() function similar > > to the existing vmf_insert_pfn_prot() for use with drm/ttm. > > > > I'd like to merge this through a drm tree. > > > > Changes since v1: > > *) Formatting fixes in patch 1 > > *) Updated commit message of patch 2. > > Changes since v2: > > *) Moved vmf_insert_mixed_prot() export to patch 2 (Michal Hocko) > > *) Documented under which conditions it's safe to use a page protection > > different from struct vm_area_struct::vm_page_prot. (Michal Hocko) > > Changes since v3: > > *) More documentation regarding under which conditions it's safe to use a > > page protection different from struct vm_area_struct::vm_page_prot. This > > time also in core vm. (Michal Hocko) > > > > Cc: Andrew Morton > > Cc: Michal Hocko > > Cc: "Matthew Wilcox (Oracle)" > > Cc: "Kirill A. Shutemov" > > Cc: Ralph Campbell > > Cc: "Jérôme Glisse" > > Cc: "Christian König" > > > Seems all concerns with this series have been addressed. Could I have an > ack to merge this through a DRM tree? > Yes, please do that. Acked-by: Andrew Morton for both. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v8 20/26] powerpc: book3s64: convert to pin_user_pages() and put_user_page()
On Mon, 9 Dec 2019 21:53:00 -0800 John Hubbard wrote: > > Correction: this is somehow missing the fixes that resulted from Jan Kara's > > review (he > > noted that we can't take a page lock in this context). I must have picked > > up the > > wrong version of it, when I rebased for -rc1. > > > > Andrew, given that the series is now in -mm, what's the preferred way for me > to fix this? > Send a v9 version of the whole series? Or something else? I think a full resend is warranted at this time - it's only been in there a day and there seem to be quite a number of changes to be made. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v8 17/26] media/v4l2-core: set pages dirty upon releasing DMA buffers
On Mon, 9 Dec 2019 14:53:35 -0800 John Hubbard wrote: > After DMA is complete, and the device and CPU caches are synchronized, > it's still required to mark the CPU pages as dirty, if the data was > coming from the device. However, this driver was just issuing a > bare put_page() call, without any set_page_dirty*() call. > > Fix the problem, by calling set_page_dirty_lock() if the CPU pages > were potentially receiving data from the device. > > Reviewed-by: Christoph Hellwig > Acked-by: Hans Verkuil > Cc: Mauro Carvalho Chehab > Cc: What are the user-visible effects of this change? As it's cc:stable I'd normally send this to Linus within 1-2 weeks, or sooner. Please confirm that this is a standalone fix, independent of the rest of this series. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm: Limit to INT_MAX in create_blob ioctl
On Wed, 6 Nov 2019 09:24:18 -0800 Kees Cook wrote: > > Since this is -mm can I have a stable sha1 or something for > > referencing? Or do you want to include this in the -mm patch bomb for > > the merge window? > > Traditionally these things live in akpm's tree when they are fixes for > patches in there. I have no idea how the Fixes tags work in that case, > though... I queued it immediately ahead of uaccess-disallow-int_max-copy-sizes.patch so all should be good, thanks. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 205065] New: workqueue: PF_MEMALLOC task 173(kswapd0) is flushing !WQ_MEM_RECLAIM events:gen6_pm_rps_work [i915]
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 01 Oct 2019 17:06:35 + bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=205065 > > Bug ID: 205065 >Summary: workqueue: PF_MEMALLOC task 173(kswapd0) is flushing > !WQ_MEM_RECLAIM events:gen6_pm_rps_work [i915] >Product: Memory Management >Version: 2.5 > Kernel Version: Ubuntu 5.3.0-13.14-generic 5.3.0 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: a...@linux-foundation.org > Reporter: romanescu.2...@gmail.com > Regression: No Maybe Tejun can help interpret this ;) > Open bug in launchpad.net > https://bugs.launchpad.net/bugs/1846241 > > dmesg: > [ 9998.518472] [ cut here ] > [ 9998.518505] workqueue: PF_MEMALLOC task 173(kswapd0) is flushing > !WQ_MEM_RECLAIM events:gen6_pm_rps_work [i915] > [ 9998.518516] WARNING: CPU: 5 PID: 173 at kernel/workqueue.c:2598 > check_flush_dependency+0xa7/0x140 > [ 9998.518517] Modules linked in: usbhid rfcomm ccm cmac bnep > snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio > nls_iso8859_1 snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc > snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi > snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine x86_pkg_temp_thermal > intel_powerclamp coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hda_core > kvm irqbypass snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event intel_rapl_msr > snd_rawmidi ath9k_htc mei_hdcp ath9k_common btusb uvcvideo crct10dif_pclmul > ath9k_hw btrtl videobuf2_vmalloc crc32_pclmul videobuf2_memops btbcm ath > snd_seq btintel videobuf2_v4l2 i915 ghash_clmulni_intel mac80211 > videobuf2_common hid_sensor_incl_3d bluetooth hid_sensor_magn_3d > hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation videodev > snd_seq_device hid_sensor_trigger snd_timer cfg80211 drm_kms_helper mc > industrialio_triggered_buffer ecdh_generic kfifo_buf ecc > [ 9998.518560] processor_thermal_device hid_sensor_iio_common libarc4 > aesni_intel spi_pxa2xx_platform drm snd intel_rapl_common industrialio > aes_x86_64 crypto_simd i2c_algo_bit fb_sys_fops input_leds cryptd glue_helper > joydev dw_dmac intel_cstate syscopyarea sysfillrect intel_xhci_usb_role_switch > mei_me intel_rapl_perf serio_raw wmi_bmof intel_wmi_thunderbolt dw_dmac_core > soundcore idma64 8250_dw cros_ec_ishtp mei hid_multitouch roles virt_dma > cros_ec_core sysimgblt intel_pch_thermal intel_soc_dts_iosf hp_wmi > soc_button_array int3403_thermal intel_vbtn int340x_thermal_zone sparse_keymap > hp_accel acpi_pad lis3lv02d hp_wireless input_polldev int3400_thermal > acpi_thermal_rel mac_hid sch_fq_codel parport_pc ppdev sunrpc lp parport > ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c > hid_sensor_custom hid_sensor_hub intel_ishtp_loader intel_ishtp_hid > hid_generic > psmouse sdhci_pci cqhci i2c_i801 ahci intel_lpss_pci intel_ish_ipc sdhci > i2c_hid libahci intel_lpss intel_ishtp hid wmi > [ 9998.518603] pinctrl_sunrisepoint video pinctrl_intel > [ 9998.518609] CPU: 5 PID: 173 Comm: kswapd0 Not tainted 5.3.0-13-generic > #14-Ubuntu > [ 9998.518610] Hardware name: HP HP Pavilion x360 Convertible 14-cd0xxx/8486, > BIOS F.34 04/29/2019 > [ 9998.518614] RIP: 0010:check_flush_dependency+0xa7/0x140 > [ 9998.518616] Code: 8d 8a 70 0a 00 00 4d 89 e0 48 8d 8b b0 00 00 00 4c 89 ca > 48 c7 c7 e8 93 d3 a1 48 89 45 e0 c6 05 c4 29 75 01 01 e8 f4 13 fe ff <0f> 0b > 48 > 8b 45 e0 eb 0f 4c 89 ef e8 39 8e 00 00 41 f6 45 25 08 75 > [ 9998.518618] RSP: 0018:ba44c034f7f0 EFLAGS: 00010086 > [ 9998.518620] RAX: RBX: 9d76a400ce00 RCX: > > [ 9998.518621] RDX: 0063 RSI: a2581fe3 RDI: > 0046 > [ 9998.518623] RBP: ba44c034f810 R08: a2581f80 R09: > 0063 > [ 9998.518624] R10: a2582360 R11: a2581fcb R12: > c1092710 > [ 9998.518625] R13: 9d76a30b5e00 R14: 0001 R15: > 9d76a5df0700 > [ 9998.518627] FS: () GS:9d76a5d4() > knlGS: > [ 9998.518628] CS: 0010 DS: ES: CR0: 80050033 > [ 9998.518630] CR2: 7f877e503000 CR3: 00019be0a002 CR4: > 003606e0 > [ 9998.518631] Call Trace: > [ 9998.518636] __flush_work+0x97/0x1d0 > [ 9998.518641] ? enqueue_hrtimer+0x3d/0x90 > [ 9998.518644] __cancel_work_timer+0x10e/0x190 > [ 9998.518648] ? _cond_resched+0x19/0x30 > [ 9998.518651] ? synchronize_irq+0x3e/0xb0 > [ 9998.518654] cancel_work_sync+0x10/0x20 > [ 9998.518692] gen6_disable_rps_interrupts+0x95/0xc0 [i915] > [ 9998.518732] gen6_rps_idle+0x1f/0xf0 [i915] > [ 9998.518772]
Re: [PATCH 3/4] kernel.h: Add non_block_start/end()
On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter wrote: > Hi Peter, > > Iirc you've been involved at least somewhat in discussing this. -mm folks > are a bit undecided whether these new non_block semantics are a good idea. > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe > are less enthusiastic. Jason said he's ok with merging the hmm side of > this if scheduler folks ack. If not, then I'll respin with the > preempt_disable/enable instead like in v1. I became mollified once Michel explained the rationale. I think it's OK. It's very specific to the oom reaper and hopefully won't be used more widely(?).
Re: [PATCH 2/5] kernel.h: Add non_block_start/end()
On Thu, 15 Aug 2019 10:44:29 +0200 Michal Hocko wrote: > > I continue to struggle with this. It introduces a new kernel state > > "running preemptibly but must not call schedule()". How does this make > > any sense? > > > > Perhaps a much, much more detailed description of the oom_reaper > > situation would help out. > > The primary point here is that there is a demand of non blockable mmu > notifiers to be called when the oom reaper tears down the address space. > As the oom reaper is the primary guarantee of the oom handling forward > progress it cannot be blocked on anything that might depend on blockable > memory allocations. These are not really easy to track because they > might be indirect - e.g. notifier blocks on a lock which other context > holds while allocating memory or waiting for a flusher that needs memory > to perform its work. If such a blocking state happens that we can end up > in a silent hang with an unusable machine. > Now we hope for reasonable implementations of mmu notifiers (strong > words I know ;) and this should be relatively simple and effective catch > all tool to detect something suspicious is going on. > > Does that make the situation more clear? Yes, thanks, much. Maybe a code comment along the lines of This is on behalf of the oom reaper, specifically when it is calling the mmu notifiers. The problem is that if the notifier were to block on, for example, mutex_lock() and if the process which holds that mutex were to perform a sleeping memory allocation, the oom reaper is now blocked on completion of that memory allocation. btw, do we need task_struct.non_block_count? Perhaps the oom reaper thread could set a new PF_NONBLOCK (which would be more general than PF_OOM_REAPER). If we run out of PF_ flags, use (current == oom_reaper_th).
Re: [PATCH 1/5] mm: Check if mmu notifier callbacks are allowed to fail
On Wed, 14 Aug 2019 22:20:23 +0200 Daniel Vetter wrote: > Just a bit of paranoia, since if we start pushing this deep into > callchains it's hard to spot all places where an mmu notifier > implementation might fail when it's not allowed to. > > Inspired by some confusion we had discussing i915 mmu notifiers and > whether we could use the newly-introduced return value to handle some > corner cases. Until we realized that these are only for when a task > has been killed by the oom reaper. > > An alternative approach would be to split the callback into two > versions, one with the int return value, and the other with void > return value like in older kernels. But that's a lot more churn for > fairly little gain I think. > > Summary from the m-l discussion on why we want something at warning > level: This allows automated tooling in CI to catch bugs without > humans having to look at everything. If we just upgrade the existing > pr_info to a pr_warn, then we'll have false positives. And as-is, no > one will ever spot the problem since it's lost in the massive amounts > of overall dmesg noise. > > ... > > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -179,6 +179,8 @@ int __mmu_notifier_invalidate_range_start(struct > mmu_notifier_range *range) > pr_info("%pS callback failed with %d in > %sblockable context.\n", > mn->ops->invalidate_range_start, _ret, > !mmu_notifier_range_blockable(range) ? > "non-" : ""); > + WARN_ON(mmu_notifier_range_blockable(range) || > + ret != -EAGAIN); > ret = _ret; > } > } A problem with WARN_ON(a || b) is that if it triggers, we don't know whether it was because of a or because of b. Or both. So I'd suggest WARN_ON(a); WARN_ON(b);
Re: [PATCH 2/5] kernel.h: Add non_block_start/end()
On Wed, 14 Aug 2019 22:20:24 +0200 Daniel Vetter wrote: > In some special cases we must not block, but there's not a > spinlock, preempt-off, irqs-off or similar critical section already > that arms the might_sleep() debug checks. Add a non_block_start/end() > pair to annotate these. > > This will be used in the oom paths of mmu-notifiers, where blocking is > not allowed to make sure there's forward progress. Quoting Michal: > > "The notifier is called from quite a restricted context - oom_reaper - > which shouldn't depend on any locks or sleepable conditionals. The code > should be swift as well but we mostly do care about it to make a forward > progress. Checking for sleepable context is the best thing we could come > up with that would describe these demands at least partially." > > Peter also asked whether we want to catch spinlocks on top, but Michal > said those are less of a problem because spinlocks can't have an > indirect dependency upon the page allocator and hence close the loop > with the oom reaper. I continue to struggle with this. It introduces a new kernel state "running preemptibly but must not call schedule()". How does this make any sense? Perhaps a much, much more detailed description of the oom_reaper situation would help out. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v19 00/15] arm64: untag user pointers passed to the kernel
On Thu, 8 Aug 2019 14:12:19 -0700 Kees Cook wrote: > > The ones that are left are the mm ones: 4, 5, 6, 7 and 8. > > > > Andrew, could you take a look and give your Acked-by or pick them up > > directly? > > Given the subsystem Acks, it seems like 3-10 and 12 could all just go > via Andrew? I hope he agrees. :) I'll grab everything that has not yet appeared in linux-next. If more of these patches appear in linux-next I'll drop those as well. The review discussion against " [PATCH v19 02/15] arm64: Introduce prctl() options to control the tagged user addresses ABI" has petered out inconclusively. prctl() vs arch_prctl(). ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 204407] New: Bad page state in process Xorg
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 01 Aug 2019 22:34:16 + bugzilla-dae...@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=204407 > > Bug ID: 204407 >Summary: Bad page state in process Xorg >Product: Memory Management >Version: 2.5 > Kernel Version: 5.3.0-rc2-00013 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: a...@linux-foundation.org > Reporter: p...@vandrovec.name > Regression: No > > Created attachment 284081 > --> https://bugzilla.kernel.org/attachment.cgi?id=284081=edit > dmesg > > I've upgraded from 5.3-rc1 to 5.3-rc2, and when I started X server, system > became unhappy: > > [259701.387365] BUG: Bad page state in process Xorg pfn:2a300 > [259701.393593] page:eaa8c000 refcount:0 mapcount:-128 > mapping: index:0x0 > [259701.402832] flags: 0x2000() > [259701.407426] raw: 2000 822ab778 eaa8f208 > > [259701.415900] raw: 0003 ff7f > > [259701.424373] page dumped because: nonzero mapcount > [259701.429847] Modules linked in: af_packet xt_REDIRECT nft_compat x_tables > nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_tables > nfnetlink ppdev parport fuse autofs4 binfmt_misc uinput twofish_generic > twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common > camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 serpent_avx_x86_64 > serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 > blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cmac > xcbc rmd160 af_key xfrm_algo rpcsec_gss_krb5 nfsv4 nls_iso8859_2 cifs libarc4 > nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipv6 crc_ccitt > nf_defrag_ipv6 snd_hda_codec_hdmi pktcdvd coretemp hwmon intel_rapl_common > iosf_mbi x86_pkg_temp_thermal snd_hda_codec_realtek snd_hda_codec_generic > ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul > snd_hwdep crc32c_intel snd_hda_core ghash_clmulni_intel snd_pcm_oss uas > iTCO_wdt aesni_intel e1000e > [259701.429873] snd_mixer_oss iTCO_vendor_support aes_x86_64 snd_pcm > crypto_simd ptp mei_me dcdbas lpc_ich sr_mod cryptd usb_storage snd_timer > glue_helper mfd_core input_leds pps_core tpm_tis cdrom i2c_i801 snd mei > tpm_tis_core sg tpm [last unloaded: parport_pc] > [259701.539387] CPU: 10 PID: 4860 Comm: Xorg Tainted: GT > 5.3.0-rc2-64-00013-g03f05a670a3d #69 > [259701.549382] Hardware name: Dell Inc. Precision T3610/09M8Y8, BIOS A16 > 02/05/2018 > [259701.549382] Call Trace: > [259701.549382] dump_stack+0x46/0x60 > [259701.549382] bad_page.cold.28+0x81/0xb4 > [259701.549382] __free_pages_ok+0x236/0x240 > [259701.549382] __ttm_dma_free_page+0x2f/0x40 > [259701.549382] ttm_dma_unpopulate+0x29b/0x370 > [259701.549382] ttm_tt_destroy.part.6+0x44/0x50 > [259701.549382] ttm_bo_cleanup_memtype_use+0x29/0x70 > [259701.549382] ttm_bo_put+0x225/0x280 > [259701.549382] ttm_bo_vm_close+0x10/0x20 > [259701.549382] remove_vma+0x20/0x40 > [259701.549382] __do_munmap+0x2da/0x420 > [259701.549382] __vm_munmap+0x66/0xc0 > [259701.549382] __x64_sys_munmap+0x22/0x30 > [259701.549382] do_syscall_64+0x5e/0x1a0 > [259701.549382] ? prepare_exit_to_usermode+0x75/0xa0 > [259701.549382] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [259701.549382] RIP: 0033:0x7f504d0ec1d7 > [259701.549382] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 b1 6c 0c 00 f7 > d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d > 01 > f0 ff ff 73 01 c3 48 8b 0d 89 6c 0c 00 f7 d8 64 89 01 48 > [259701.549382] RSP: 002b:7ffe529db138 EFLAGS: 0206 ORIG_RAX: > 000b > [259701.549382] RAX: ffda RBX: 564a5eabce70 RCX: > 7f504d0ec1d7 > [259701.549382] RDX: 7ffe529db140 RSI: 0040 RDI: > 7f5044b65000 > [259701.549382] RBP: 564a5eafe460 R08: 000b R09: > 00010283e000 > [259701.549382] R10: 0001 R11: 0206 R12: > 564a5e475b08 > [259701.549382] R13: 564a5e475c80 R14: 7ffe529db190 R15: > 0c80 > [259701.707238] Disabling lock debugging due to kernel taint I assume the above is misbehaviour in the DRM code? > > Also - maybe related, maybe not - I've got three userspace crashes earlier on > this kernel (but never before): > > [77154.886836] iscons.py[12441]: segfault at 2c ip 080cf0b5 sp > f773fb60 error 4 in python[8048000+11a000] > [77154.898376] Code: 02 0f 84 4a 2e 00 00 8b 4d 08 8b bd 04 ff ff ff 8b 59 38 > 8b 57 20 8b 7b 10 85 ff 0f 84 ee 22 00 00 8b 8d 04 ff ff ff 8b 59 08 <8b> 43 > 2c > 85
Re: mmotm 2019-07-04-15-01 uploaded (gpu/drm/i915/oa/)
On Thu, 04 Jul 2019 22:22:41 -0700 Joe Perches wrote: > > So when comparing a zero-length file with a non-existent file, diff > > produces no output. > > Why use the -N option ? > > $ diff --help > [...] > -N, --new-file treat absent files as empty > > otherwise > > $ cd $(mktemp -d -p .) > $ touch x > $ diff -u x y > diff: y: No such file or directory Without -N diff fails and exits with an error. -N does what's desired as long as the non-missing file isn't empty. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: mmotm 2019-07-04-15-01 uploaded (gpu/drm/i915/oa/)
On Fri, 5 Jul 2019 13:14:35 +1000 Stephen Rothwell wrote: > > I checked next-20190704 tag. > > > > I see the empty file > > drivers/gpu/drm/i915/oa/Makefile > > > > Did someone delete it? > > Commit > > 5ed7a0cf3394 ("drm/i915: Move OA files to separate folder") > > from the drm-intel tree seems to have created it as an empty file. hrm. diff(1) doesn't seem to know how to handle a zero-length file. y:/home/akpm> mkdir foo y:/home/akpm> cd foo y:/home/akpm/foo> touch x y:/home/akpm/foo> diff -uN x y y:/home/akpm/foo> date > x y:/home/akpm/foo> diff -uN x y --- x 2019-07-04 21:58:37.815028211 -0700 +++ y 1969-12-31 16:00:00.0 -0800 @@ -1 +0,0 @@ -Thu Jul 4 21:58:37 PDT 2019 So when comparing a zero-length file with a non-existent file, diff produces no output. This'll make things happy. And I guess it should be done to protect all the valuable intellectual property in that file. --- /dev/null +++ a/drivers/gpu/drm/i915/oa/Makefile @@ -0,0 +1 @@ +# SPDX-License-Identifier: GPL-2.0 _ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 16/25] device-dax: use the dev_pagemap internal refcount
On Fri, 28 Jun 2019 12:14:44 -0700 Dan Williams wrote: > I believe -mm auto drops patches when they appear in the -next > baseline. So it should "just work" to pull it into the series and send > it along for -next inclusion. Yup. Although it isn't very "auto" - I manually check that the patch which turned up in -next was identical to the version which I had. If not, I go find out why... ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: dev_pagemap related cleanups
On Thu, 13 Jun 2019 14:24:20 -0600 Logan Gunthorpe wrote: > > > On 2019-06-13 2:21 p.m., Dan Williams wrote: > > On Thu, Jun 13, 2019 at 1:18 PM Logan Gunthorpe wrote: > >> > >> > >> > >> On 2019-06-13 12:27 p.m., Dan Williams wrote: > >>> On Thu, Jun 13, 2019 at 2:43 AM Christoph Hellwig wrote: > > Hi Dan, Jérôme and Jason, > > below is a series that cleans up the dev_pagemap interface so that > it is more easily usable, which removes the need to wrap it in hmm > and thus allowing to kill a lot of code > > Diffstat: > > 22 files changed, 245 insertions(+), 802 deletions(-) > >>> > >>> Hooray! > >>> > Git tree: > > git://git.infradead.org/users/hch/misc.git hmm-devmem-cleanup > >>> > >>> I just realized this collides with the dev_pagemap release rework in > >>> Andrew's tree (commit ids below are from next.git and are not stable) > >>> > >>> 4422ee8476f0 mm/devm_memremap_pages: fix final page put race > >>> 771f0714d0dc PCI/P2PDMA: track pgmap references per resource, not globally > >>> af37085de906 lib/genalloc: introduce chunk owners > >>> e0047ff8aa77 PCI/P2PDMA: fix the gen_pool_add_virt() failure path > >>> 0315d47d6ae9 mm/devm_memremap_pages: introduce devm_memunmap_pages > >>> 216475c7eaa8 drivers/base/devres: introduce devm_release_action() > >>> > >>> CONFLICT (content): Merge conflict in tools/testing/nvdimm/test/iomap.c > >>> CONFLICT (content): Merge conflict in mm/hmm.c > >>> CONFLICT (content): Merge conflict in kernel/memremap.c > >>> CONFLICT (content): Merge conflict in include/linux/memremap.h > >>> CONFLICT (content): Merge conflict in drivers/pci/p2pdma.c > >>> CONFLICT (content): Merge conflict in drivers/nvdimm/pmem.c > >>> CONFLICT (content): Merge conflict in drivers/dax/device.c > >>> CONFLICT (content): Merge conflict in drivers/dax/dax-private.h > >>> > >>> Perhaps we should pull those out and resend them through hmm.git? > >> > >> Hmm, I've been waiting for those patches to get in for a little while now > >> ;( > > > > Unless Andrew was going to submit as v5.2-rc fixes I think I should > > rebase / submit them on current hmm.git and then throw these cleanups > > from Christoph on top? > > Whatever you feel is best. I'm just hoping they get in sooner rather > than later. They do fix a bug after all. Let me know if you want me to > retest the P2PDMA stuff after the rebase. I had them down for 5.3-rc1. I'll send them along for 5.2-rc5 instead. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH V2] include: linux: Regularise the use of FIELD_SIZEOF macro
On Tue, 11 Jun 2019 15:00:10 -0600 Andreas Dilger wrote: > >> to FIELD_SIZEOF > > > > As Alexey has pointed out, C structs and unions don't have fields - > > they have members. So this is an opportunity to switch everything to > > a new member_sizeof(). > > > > What do people think of that and how does this impact the patch footprint? > > I did a check, and FIELD_SIZEOF() is used about 350x, while sizeof_field() > is about 30x, and SIZEOF_FIELD() is only about 5x. Erk. Sorry, I should have grepped. > That said, I'm much more in favour of "sizeof_field()" or "sizeof_member()" > than FIELD_SIZEOF(). Not only does that better match "offsetof()", with > which it is closely related, but is also closer to the original "sizeof()". > > Since this is a rather trivial change, it can be split into a number of > patches to get approval/landing via subsystem maintainers, and there is no > huge urgency to remove the original macros until the users are gone. It > would make sense to remove SIZEOF_FIELD() and sizeof_field() quickly so > they don't gain more users, and the remaining FIELD_SIZEOF() users can be > whittled away as the patches come through the maintainer trees. In that case I'd say let's live with FIELD_SIZEOF() and remove sizeof_field() and SIZEOF_FIELD(). I'm a bit surprised that the FIELD_SIZEOF() definition ends up in stddef.h rather than in kernel.h where such things are normally defined. Why is that?
Re: [PATCH V2] include: linux: Regularise the use of FIELD_SIZEOF macro
On Wed, 12 Jun 2019 01:08:36 +0530 Shyam Saini wrote: > Currently, there are 3 different macros, namely sizeof_field, SIZEOF_FIELD > and FIELD_SIZEOF which are used to calculate the size of a member of > structure, so to bring uniformity in entire kernel source tree lets use > FIELD_SIZEOF and replace all occurrences of other two macros with this. > > For this purpose, redefine FIELD_SIZEOF in include/linux/stddef.h and > tools/testing/selftests/bpf/bpf_util.h and remove its defination from > include/linux/kernel.h > > In favour of FIELD_SIZEOF, this patch also deprecates other two similar > macros sizeof_field and SIZEOF_FIELD. > > For code compatibility reason, retain sizeof_field macro as a wrapper macro > to FIELD_SIZEOF As Alexey has pointed out, C structs and unions don't have fields - they have members. So this is an opportunity to switch everything to a new member_sizeof(). What do people think of that and how does this impact the patch footprint? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: RFC: Run a dedicated hmm.git for 5.3
On Fri, 24 May 2019 09:44:55 -0300 Jason Gunthorpe wrote: > Now that -mm merged the basic hmm API skeleton I think running like > this would get us quickly to the place we all want: comprehensive in tree > users of hmm. > > Andrew, would this be acceptable to you? Sure. Please take care not to permit this to reduce the amount of exposure and review which the core HMM pieces get.
Re: [PATCH 1/2] mm/hmm: support automatic NUMA balancing
On Fri, 10 May 2019 19:53:23 + "Kuehling, Felix" wrote: > From: Philip Yang > > While the page is migrating by NUMA balancing, HMM failed to detect this > condition and still return the old page. Application will use the new > page migrated, but driver pass the old page physical address to GPU, > this crash the application later. > > Use pte_protnone(pte) to return this condition and then hmm_vma_do_fault > will allocate new page. > > Signed-off-by: Philip Yang This should have included your signed-off-by:, since you were on the patch delivery path. I'll make that change to my copy of the patch, OK? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v6 0/8] mmu notifier provide context informations
On Tue, 26 Mar 2019 12:47:39 -0400 jgli...@redhat.com wrote: > From: Jérôme Glisse > > (Andrew this apply on top of my HMM patchset as otherwise you will have > conflict with changes to mm/hmm.c) > > Changes since v5: > - drop KVM bits waiting for KVM people to express interest if they > do not then i will post patchset to remove change_pte_notify as > without the changes in v5 change_pte_notify is just useless (it > it is useless today upstream it is just wasting cpu cycles) > - rebase on top of lastest Linus tree > > Previous cover letter with minor update: > > > Here i am not posting users of this, they already have been posted to > appropriate mailing list [6] and will be merge through the appropriate > tree once this patchset is upstream. > > Note that this serie does not change any behavior for any existing > code. It just pass down more information to mmu notifier listener. > > The rational for this patchset: > > CPU page table update can happens for many reasons, not only as a > result of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) > but also as a result of kernel activities (memory compression, reclaim, > migration, ...). > > This patch introduce a set of enums that can be associated with each > of the events triggering a mmu notifier: > > - UNMAP: munmap() or mremap() > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > - PROTECTION_VMA: change in access protections for the range > - PROTECTION_PAGE: change in access protections for page in the range > - SOFT_DIRTY: soft dirtyness tracking > > Being able to identify munmap() and mremap() from other reasons why the > page table is cleared is important to allow user of mmu notifier to > update their own internal tracking structure accordingly (on munmap or > mremap it is not longer needed to track range of virtual address as it > becomes invalid). Without this serie, driver are force to assume that > every notification is an munmap which triggers useless trashing within > drivers that associate structure with range of virtual address. Each > driver is force to free up its tracking structure and then restore it > on next device page fault. With this serie we can also optimize device > page table update [6]. > > More over this can also be use to optimize out some page table updates > like for KVM where we can update the secondary MMU directly from the > callback instead of clearing it. We seem to be rather short of review input on this patchset. ie: there is none. > ACKS AMD/RADEON https://lkml.org/lkml/2019/2/1/395 OK, kind of ackish, but not a review. > ACKS RDMA https://lkml.org/lkml/2018/12/6/1473 This actually acks the infiniband part of a patch which isn't in this series. So we have some work to do, please. Who would be suitable reviewers? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v4 0/9] mmu notifier provide context informations
On Thu, 31 Jan 2019 11:10:06 -0500 Jerome Glisse wrote: > Andrew what is your plan for this ? I had a discussion with Peter Xu > and Andrea about change_pte() and kvm. Today the change_pte() kvm > optimization is effectively disabled because of invalidate_range > calls. With a minimal couple lines patch on top of this patchset > we can bring back the kvm change_pte optimization and we can also > optimize some other cases like for instance when write protecting > after fork (but i am not sure this is something qemu does often so > it might not help for real kvm workload). > > I will be posting a the extra patch as an RFC, but in the meantime > i wanted to know what was the status for this. The various drm patches appear to be headed for collisions with drm tree development so we'll need to figure out how to handle that and in what order things happen. It's quite unclear from the v4 patchset's changelogs that this has anything to do with KVM and "the change_pte() kvm optimization" hasn't been described anywhere(?). So.. I expect the thing to do here is to get everything finished, get the changelogs completed with this new information and do a resend. Can we omit the drm and rdma patches for now? Feed them in via the subsystem maintainers when the dust has settled? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 3/3] mm/mmu_notifier: contextual information for event triggering invalidation
On Mon, 3 Dec 2018 15:18:17 -0500 jgli...@redhat.com wrote: > CPU page table update can happens for many reasons, not only as a result > of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also > as a result of kernel activities (memory compression, reclaim, migration, > ...). > > Users of mmu notifier API track changes to the CPU page table and take > specific action for them. While current API only provide range of virtual > address affected by the change, not why the changes is happening. > > This patchset adds event information so that users of mmu notifier can > differentiate among broad category: > - UNMAP: munmap() or mremap() > - CLEAR: page table is cleared (migration, compaction, reclaim, ...) > - PROTECTION_VMA: change in access protections for the range > - PROTECTION_PAGE: change in access protections for page in the range > - SOFT_DIRTY: soft dirtyness tracking > > Being able to identify munmap() and mremap() from other reasons why the > page table is cleared is important to allow user of mmu notifier to > update their own internal tracking structure accordingly (on munmap or > mremap it is not longer needed to track range of virtual address as it > becomes invalid). > > ... > > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -519,6 +519,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) > struct mmu_notifier_range range; > struct mmu_gather tlb; > > + range.event = MMU_NOTIFY_CLEAR; > range.start = vma->vm_start; > range.end = vma->vm_end; > range.mm = mm; mmu_notifier_range and MMU_NOTIFY_CLEAR aren't defined if CONFIG_MMU_NOTIFIER=n. I'll try a temporary bodge: +++ a/include/linux/mmu_notifier.h @@ -10,8 +10,6 @@ struct mmu_notifier; struct mmu_notifier_ops; -#ifdef CONFIG_MMU_NOTIFIER - /* * The mmu notifier_mm structure is allocated and installed in * mm->mmu_notifier_mm inside the mm_take_all_locks() protected @@ -32,6 +30,8 @@ struct mmu_notifier_range { bool blockable; }; +#ifdef CONFIG_MMU_NOTIFIER + struct mmu_notifier_ops { /* * Called either by mmu_notifier_unregister or when the mm is But this new code should vanish altogether if CONFIG_MMU_NOTIFIER=n, please. Or at least, we shouldn't be unnecessarily initializing .mm and .event. Please take a look at debloating this code. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: Replace all open encodings for NUMA_NO_NODE
On Fri, 23 Nov 2018 15:24:16 +0530 Anshuman Khandual wrote: > At present there are multiple places where invalid node number is encoded > as -1. Even though implicitly understood it is always better to have macros > in there. Replace these open encodings for an invalid node number with the > global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like > 'invalid node' from various places redirecting them to a common definition. > > ... > > Build tested this with multiple cross compiler options like alpha, sparc, > arm64, x86, powerpc, powerpc64le etc with their default config which might > not have compiled tested all driver related changes. I will appreciate > folks giving this a test in their respective build environment. > > All these places for replacement were found by running the following grep > patterns on the entire kernel code. Please let me know if this might have > missed some instances. This might also have replaced some false positives. > I will appreciate suggestions, inputs and review. > > 1. git grep "nid == -1" > 2. git grep "node == -1" > 3. git grep "nid = -1" > 4. git grep "node = -1" The build testing is good, but I worry that some of the affected files don't clearly have numa.h in their include paths, for the NUMA_NO_NODE definition. The first thing I looked it is arch/powerpc/include/asm/pci-bridge.h. Maybe it somehow manages to include numa.h via some nested include, but if so, is that reliable across all config combinations and as code evolves? So I think that the patch should have added an explicit include of numa.h, especially in cases where the affected file previously had no references to any of the things which numa.h defines. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v8 0/7] mm: Merge hmm into devm_memremap_pages, mark GPL-only
On Tue, 20 Nov 2018 15:12:49 -0800 Dan Williams wrote: > Changes since v7 [1]: > At Maintainer Summit, Greg brought up a topic I proposed around > EXPORT_SYMBOL_GPL usage. The motivation was considerations for when > EXPORT_SYMBOL_GPL is warranted and the criteria for taking the > exceptional step of reclassifying an existing export. Specifically, I > wanted to make the case that although the line is fuzzy and hard to > specify in abstract terms, it is nonetheless clear that > devm_memremap_pages() and HMM (Heterogeneous Memory Management) have > crossed it. The devm_memremap_pages() facility should have been > EXPORT_SYMBOL_GPL from the beginning, and HMM as a derivative of that > functionality should have naturally picked up that designation as well. > > Contrary to typical rules, the HMM infrastructure was merged upstream > with zero in-tree consumers. There was a promise at the time that those > users would be merged "soon", but it has been over a year with no drivers > arriving. While the Nouveau driver is about to belatedly make good on > that promise it is clear that HMM was targeted first and foremost at an > out-of-tree consumer. > > HMM is derived from devm_memremap_pages(), a facility Christoph and I > spearheaded to support persistent memory. It combines a device lifetime > model with a dynamically created 'struct page' / memmap array for any > physical address range. It enables coordination and control of the many > code paths in the kernel built to interact with memory via 'struct page' > objects. With HMM the integration goes even deeper by allowing device > drivers to hook and manipulate page fault and page free events. > > One interpretation of when EXPORT_SYMBOL is suitable is when it is > exporting stable and generic leaf functionality. The > devm_memremap_pages() facility continues to see expanding use cases, > peer-to-peer DMA being the most recent, with no clear end date when it > will stop attracting reworks and semantic changes. It is not suitable to > export devm_memremap_pages() as a stable 3rd party driver API due to the > fact that it is still changing and manipulates core behavior. Moreover, > it is not in the best interest of the long term development of the core > memory management subsystem to permit any external driver to effectively > define its own system-wide memory management policies with no > encouragement to engage with upstream. > > I am also concerned that HMM was designed in a way to minimize further > engagement with the core-MM. That, with these hooks in place, > device-drivers are free to implement their own policies without much > consideration for whether and how the core-MM could grow to meet that > need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the > core-MM should be allowed the opportunity and stimulus to change and > address these new use cases as first class functionality. > The arguments are compelling. I apologize for not thinking of and/or not being made aware of them at the time. I'll take [7/7] (with all the above added to the changelog) with a view to a 4.21-rc1 merge. That gives us a couple of months for further discussion. Public discussion, please. It should be noted that [7/7] has a cc:stable. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 24 Jul 2018 16:17:47 +0200 Michal Hocko wrote: > On Fri 20-07-18 17:09:02, Andrew Morton wrote: > [...] > > - Undocumented return value. > > > > - comment "failed to reap part..." is misleading - sounds like it's > > referring to something which happened in the past, is in fact > > referring to something which might happen in the future. > > > > - fails to call trace_finish_task_reaping() in one case > > > > - code duplication. > > > > - Increases mmap_sem hold time a little by moving > > trace_finish_task_reaping() inside the locked region. So sue me ;) > > > > - Sharing the finish: path means that the trace event won't > > distinguish between the two sources of finishing. > > > > Please take a look? > > oom_reap_task_mm should return false when __oom_reap_task_mm return > false. This is what my patch did but it seems this changed by > http://www.ozlabs.org/~akpm/mmotm/broken-out/mm-oom-remove-oom_lock-from-oom_reaper.patch > so that one should be fixed. > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 104ef4a01a55..88657e018714 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -565,7 +565,7 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > /* failed to reap part of the address space. Try again later */ > if (!__oom_reap_task_mm(mm)) { > up_read(>mmap_sem); > - return true; > + return false; > } > > pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, > file-rss:%lukB, shmem-rss:%lukB\n", OK, thanks, I added that. > > On top of that the proposed cleanup looks as follows: > Looks good to me. Seems a bit strange that we omit the pr_info() output if the mm was partially reaped - people would still want to know this? Not very important though. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > ... > > @@ -571,7 +565,12 @@ static bool oom_reap_task_mm(struct task_struct *tsk, > struct mm_struct *mm) > > trace_start_task_reaping(tsk->pid); > > - __oom_reap_task_mm(mm); > + /* failed to reap part of the address space. Try again later */ > + if (!__oom_reap_task_mm(mm)) { > + up_read(>mmap_sem); > + ret = false; > + goto unlock_oom; > + } This function is starting to look a bit screwy. : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(>mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, >flags)) { : up_read(>mmap_sem); : trace_skip_task_reaping(tsk->pid); : return true; : } : : trace_start_task_reaping(tsk->pid); : : /* failed to reap part of the address space. Try again later */ : if (!__oom_reap_task_mm(mm)) { : up_read(>mmap_sem); : return true; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : up_read(>mmap_sem); : : trace_finish_task_reaping(tsk->pid); : return true; : } - Undocumented return value. - comment "failed to reap part..." is misleading - sounds like it's referring to something which happened in the past, is in fact referring to something which might happen in the future. - fails to call trace_finish_task_reaping() in one case - code duplication. I'm thinking it wants to be something like this? : /* : * Return true if we successfully acquired (then released) mmap_sem : */ : static bool oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) : { : if (!down_read_trylock(>mmap_sem)) { : trace_skip_task_reaping(tsk->pid); : return false; : } : : /* :* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't :* work on the mm anymore. The check for MMF_OOM_SKIP must run :* under mmap_sem for reading because it serializes against the :* down_write();up_write() cycle in exit_mmap(). :*/ : if (test_bit(MMF_OOM_SKIP, >flags)) { : trace_skip_task_reaping(tsk->pid); : goto out; : } : : trace_start_task_reaping(tsk->pid); : : if (!__oom_reap_task_mm(mm)) { : /* Failed to reap part of the address space. Try again later */ : goto finish; : } : : pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", : task_pid_nr(tsk), tsk->comm, : K(get_mm_counter(mm, MM_ANONPAGES)), : K(get_mm_counter(mm, MM_FILEPAGES)), : K(get_mm_counter(mm, MM_SHMEMPAGES))); : finish: : trace_finish_task_reaping(tsk->pid); : out: : up_read(>mmap_sem); : return true; : } - Increases mmap_sem hold time a little by moving trace_finish_task_reaping() inside the locked region. So sue me ;) - Sharing the finish: path means that the trace event won't distinguish between the two sources of finishing. Please take a look? ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Tue, 17 Jul 2018 10:12:01 +0200 Michal Hocko wrote: > > Any suggestions regarding how the driver developers can test this code > > path? I don't think we presently have a way to fake an oom-killing > > event? Perhaps we should add such a thing, given the problems we're > > having with that feature. > > The simplest way is to wrap an userspace code which uses these notifiers > into a memcg and set the hard limit to hit the oom. This can be done > e.g. after the test faults in all the mmu notifier managed memory and > set the hard limit to something really small. Then we are looking for a > proper process tear down. Chances are, some of the intended audience don't know how to do this and will either have to hunt down a lot of documentation or will just not test it. But we want them to test it, so a little worked step-by-step example would help things along please. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm, oom: distinguish blockable mode for mmu notifiers
On Mon, 16 Jul 2018 13:50:58 +0200 Michal Hocko wrote: > From: Michal Hocko > > There are several blockable mmu notifiers which might sleep in > mmu_notifier_invalidate_range_start and that is a problem for the > oom_reaper because it needs to guarantee a forward progress so it cannot > depend on any sleepable locks. > > Currently we simply back off and mark an oom victim with blockable mmu > notifiers as done after a short sleep. That can result in selecting a > new oom victim prematurely because the previous one still hasn't torn > its memory down yet. > > We can do much better though. Even if mmu notifiers use sleepable locks > there is no reason to automatically assume those locks are held. > Moreover majority of notifiers only care about a portion of the address > space and there is absolutely zero reason to fail when we are unmapping an > unrelated range. Many notifiers do really block and wait for HW which is > harder to handle and we have to bail out though. > > This patch handles the low hanging fruid. > __mmu_notifier_invalidate_range_start > gets a blockable flag and callbacks are not allowed to sleep if the > flag is set to false. This is achieved by using trylock instead of the > sleepable lock for most callbacks and continue as long as we do not > block down the call chain. I assume device driver developers are wondering "what does this mean for me". As I understand it, the only time they will see blockable==false is when their driver is being called in response to an out-of-memory condition, yes? So it is a very rare thing. Any suggestions regarding how the driver developers can test this code path? I don't think we presently have a way to fake an oom-killing event? Perhaps we should add such a thing, given the problems we're having with that feature. > I think we can improve that even further because there is a common > pattern to do a range lookup first and then do something about that. > The first part can be done without a sleeping lock in most cases AFAICS. > > The oom_reaper end then simply retries if there is at least one notifier > which couldn't make any progress in !blockable mode. A retry loop is > already implemented to wait for the mmap_sem and this is basically the > same thing. > > ... > > +static inline int mmu_notifier_invalidate_range_start_nonblock(struct > mm_struct *mm, > + unsigned long start, unsigned long end) > +{ > + int ret = 0; > + if (mm_has_notifiers(mm)) > + ret = __mmu_notifier_invalidate_range_start(mm, start, end, > false); > + > + return ret; > } nit, { if (mm_has_notifiers(mm)) return __mmu_notifier_invalidate_range_start(mm, start, end, false); return 0; } would suffice. > > ... > > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -3074,7 +3074,7 @@ void exit_mmap(struct mm_struct *mm) >* reliably test it. >*/ > mutex_lock(_lock); > - __oom_reap_task_mm(mm); > + (void)__oom_reap_task_mm(mm); > mutex_unlock(_lock); What does this do? > set_bit(MMF_OOM_SKIP, >flags); > > ... > ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] kernel.h: Add for_each_if()
On Wed, 11 Jul 2018 13:51:08 +0200 Daniel Vetter wrote: > But I still have the situation that a bunch of maintainers acked this > and Andrew Morton defacto nacked it, which I guess means I'll keep the > macro in drm? The common way to go about this seems to be to just push > the patch series with the ack in some pull request to Linus and ignore > the people who raised questions, but not really my thing. Heh. But, am I wrong? Code which uses regular kernel style doesn't have these issues. We shouldn't be enabling irregular style - we should be making such sites more regular. The fact that the compiler generates a nice warning in some cases simply helps us with that. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] kernel.h: Add for_each_if()
On Mon, 9 Jul 2018 18:25:09 +0200 Daniel Vetter wrote: > To avoid compilers complainig about ambigious else blocks when putting > an if condition into a for_each macro one needs to invert the > condition and add a dummy else. We have a nice little convenience > macro for that in drm headers, let's move it out. Subsequent patches > will roll it out to other places. > > The issue the compilers complain about are nested if with an else > block and no {} to disambiguate which if the else belongs to. The C > standard is clear, but in practice people forget: > > if (foo) > if (bar) > /* something */ > else > /* something else um, yeah, don't do that. Kernel coding style is very much to do if (foo) { if (bar) /* something */ else /* something else } And if not doing that generates a warning then, well, do that. > The same can happen in a for_each macro when it also contains an if > condition at the end, except the compiler message is now really > confusing since there's only 1 if: > > for_each_something() > if (bar) > /* something */ > else > /* something else > > The for_each_if() macro, by inverting the condition and adding an > else, avoids the compiler warning. Ditto. > Motivated by a discussion with Andy and Yisheng, who want to add > another for_each_macro which would benefit from for_each_if() instead > of hand-rolling it. Ditto. > v2: Explain a bit better what this is good for, after the discussion > with Peter Z. Presumably the above was discussed in whatever-thread-that-was. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [patch 1/1] drivers/gpu/drm/i915/intel_guc_log.c: work around gcc-4.4.4 union initializer issue
On Thu, 08 Mar 2018 12:06:46 +0200 Jani Nikula <jani.nik...@linux.intel.com> wrote: > On Wed, 07 Mar 2018, a...@linux-foundation.org wrote: > > From: Andrew Morton <a...@linux-foundation.org> > > Subject: drivers/gpu/drm/i915/intel_guc_log.c: work around gcc-4.4.4 union > > initializer issue > > > > gcc-4.4.4 has problems with initalizers of anon unions. > > > > drivers/gpu/drm/i915/intel_guc_log.c: In function 'guc_log_control': > > drivers/gpu/drm/i915/intel_guc_log.c:64: error: unknown field > > 'logging_enabled' specified in initializer > > > > Work around this. > > Thanks for the patch, pushed to drm-intel-next-queued. > > That said, how long do we have to care about old compilers? I thought we > were converging on at the very least GCC 4.5 being required. Yes, I've seen some talk about that and it is about time for us to do it. I'm not sure what stage things are at though. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] fix double ;;s in code
On Tue, 20 Feb 2018 10:03:56 +0200 Jani Nikulawrote: > On Mon, 19 Feb 2018, Pavel Machek wrote: > > On Mon 2018-02-19 16:41:35, Daniel Vetter wrote: > >> Yeah, pls split this into one patch per area, with a suitable patch > >> subject prefix. Look at git log of each file to get a feeling for what's > >> the standard in each area. > > > > Yeah I can spend hour spliting it, and then people will ignore it > > anyway. > > > > If you care about one of the files being modified, please fix the > > bug, ";;" is a clear bug. > > > > If you don't care ... well I don't care either. > > Look, if this causes just one conflict down the line because it touches > the kernel all over the place, then IMO it already wasn't worth > it. Merge conflicts are inevitable, but there's no reason to make life > harder just to cater for a cleanup patch. It's not that important. > > Had it been split up, the drm parts would've been merged already. I just stage patches like this behind linux-next. So if your stuff in in -next, you'll never notice a thing. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel