Re: Possible null pointer dereference in rcar-dmac.ko
Hi Laurent > I don't think this fully fixes the problem, as the rcar_dmac_isr_error() IRQ > handler is still registered before all this. Furthermore, at least some of > the > initialization at the end of rcar_dmac_chan_probe() has to be moved before > the > rcar_dmac_isr_channel() IRQ handler registration. > > Let's not commit a quick hack but fix the problem correctly, we should ensure > that all the initialization needed by IRQ handlers is performed before they > get registered. Yeah, indeed. We need v2 patch Best regards --- Kuninori Morimoto
Re: Do we really need d_weak_revalidate???
On Fri, Aug 18 2017, Ian Kent wrote: > On 18/08/17 13:24, NeilBrown wrote: >> On Thu, Aug 17 2017, Ian Kent wrote: >> >>> On 16/08/17 19:34, Jeff Layton wrote: On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: > On Mon, Aug 14 2017, Jeff Layton wrote: > >> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>> On Fri, Aug 11 2017, Jeff Layton wrote: >>> On Fri, 2017-08-11 at 05:55 +, Trond Myklebust wrote: > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >> flag and introduced the d_weak_revalidate dentry operation instead. >> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >> and added the new dentry operation to NFS dentries but not to >> NFSv4 >> dentries. >> >> And nobody noticed. >> >> Until today. >> >> A customer reports a situation where mount(,MS_REMOUNT,..) on an >> NFS >> filesystem hangs because the network has been deconfigured. This >> makes >> perfect sense and I suggested a code change to fix the problem. >> However when a colleague was trying to reproduce the problem to >> validate >> the fix, he couldn't. Then nor could I. >> >> The problem is trivially reproducible with NFSv3, and not at all with >> NFSv4. The reason is the missing d_weak_revalidate. >> >> We could simply add d_weak_revalidate for NFSv4, but given that it >> has been missing for 4.5 years, and the only time anyone noticed was >> when the ommission resulted in a better user experience, I do wonder >> if >> we need to. Can we just discard d_weak_revalidate? What purpose >> does >> it serve? I couldn't find one. >> >> Thanks, >> NeilBrown >> >> For reference, see >> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >> d_weak_revalidate dentry op") >> >> >> >> To reproduce the problem at home, on a system that uses systemd: >> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >> 3/ loop-mount the filesystem image read-only somewhere >> 4/ reboot >> >> If you choose v4, the reboot will succeed, possibly after a 90second >> timeout. >> If you choose v3, the reboot will hang indefinitely in systemd- >> shutdown while >> remounting the nfs filesystem read-only. >> >> If you don't use "noac" it can still hang, but only if something >> slows >> down the reboot enough that attributes have timed out by the time >> that >> systemd-shutdown runs. This happens for our customer. >> >> If the loop-mounted filesystem is not read-only, you get other >> problems. >> >> We really want systemd to figure out that the loop-mount needs to be >> unmounted first. I have ideas concerning that, but it is messy. But >> that isn't the only bug here. > > The main purpose of d_weak_revalidate() was to catch the issues that > arise when someone changes the contents of the current working > directory or its parent on the server. Since '.' and '..' are treated > specially in the lookup code, they would not be revalidated without > special treatment. That leads to issues when looking up files as > ./ or ../, since the client won't detect that its > dcache is stale until it tries to use the cached dentry+inode. > > The one thing that has changed since its introduction is, I believe, > the ESTALE handling in the VFS layer. That might fix a lot of the > dcache lookup bugs that were previously handled by > d_weak_revalidate(). > I haven't done an audit to figure out if it actually can handle all of > them. > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: vfs: allow umount to handle mountpoints without revalidating them >>> >>> You say in the comment for that commit: >>> >>> but there >>> are cases where we do want to revalidate the root of the fs. >>> >>> Do you happen to remember what those cases are? >>> >> >> Not exactly, but I _think_ I might have been assuming that we needed to >> ensure that the inode attrs on the root were up to date after the >> pathwalk. >> >> I think that was probably wrong. d_revalidate is really intended to >> ensure that the dentry in question still points to the same inode. In >> the case of the root of the mount
Re: [PATCH v14 4/5] mm: support reporting free page blocks
On Fri 18-08-17 20:23:05, Michael S. Tsirkin wrote: > On Thu, Aug 17, 2017 at 11:26:55AM +0800, Wei Wang wrote: [...] > > +void walk_free_mem_block(void *opaque1, > > +unsigned int min_order, > > +void (*visit)(void *opaque2, > > You can just avoid opaque2 completely I think, then opaque1 can > be renamed opaque. > > > + unsigned long pfn, > > + unsigned long nr_pages)) > > +{ > > + struct zone *zone; > > + struct page *page; > > + struct list_head *list; > > + unsigned int order; > > + enum migratetype mt; > > + unsigned long pfn, flags; > > + > > + for_each_populated_zone(zone) { > > + for (order = MAX_ORDER - 1; > > +order < MAX_ORDER && order >= min_order; order--) { > > + for (mt = 0; mt < MIGRATE_TYPES; mt++) { > > + spin_lock_irqsave(>lock, flags); > > + list = >free_area[order].free_list[mt]; > > + list_for_each_entry(page, list, lru) { > > + pfn = page_to_pfn(page); > > + visit(opaque1, pfn, 1 << order); > > My only concern here is inability of callback to > 1. break out of list > 2. remove page from the list As I've said before this has to be a read only API. You cannot simply fiddle with the page allocator internals under its feet. > So I would make the callback bool, and I would use > list_for_each_entry_safe. If a bool would tell to break out of the loop then I agree. This sounds useful. -- Michal Hocko SUSE Labs
Re: [PATCH v2 00/20] Speculative page faults
On 08/18/2017 03:34 AM, Laurent Dufour wrote: > This is a port on kernel 4.13 of the work done by Peter Zijlstra to > handle page fault without holding the mm semaphore [1]. > > The idea is to try to handle user space page faults without holding the > mmap_sem. This should allow better concurrency for massively threaded > process since the page fault handler will not wait for other threads memory > layout change to be done, assuming that this change is done in another part > of the process's memory space. This type page fault is named speculative > page fault. If the speculative page fault fails because of a concurrency is > detected or because underlying PMD or PTE tables are not yet allocating, it > is failing its processing and a classic page fault is then tried. > > The speculative page fault (SPF) has to look for the VMA matching the fault > address without holding the mmap_sem, so the VMA list is now managed using > SRCU allowing lockless walking. The only impact would be the deferred file > derefencing in the case of a file mapping, since the file pointer is > released once the SRCU cleaning is done. This patch relies on the change > done recently by Paul McKenney in SRCU which now runs a callback per CPU > instead of per SRCU structure [1]. > > The VMA's attributes checked during the speculative page fault processing > have to be protected against parallel changes. This is done by using a per > VMA sequence lock. This sequence lock allows the speculative page fault > handler to fast check for parallel changes in progress and to abort the > speculative page fault in that case. > > Once the VMA is found, the speculative page fault handler would check for > the VMA's attributes to verify that the page fault has to be handled > correctly or not. Thus the VMA is protected through a sequence lock which > allows fast detection of concurrent VMA changes. If such a change is > detected, the speculative page fault is aborted and a *classic* page fault > is tried. VMA sequence locks are added when VMA attributes which are > checked during the page fault are modified. > > When the PTE is fetched, the VMA is checked to see if it has been changed, > so once the page table is locked, the VMA is valid, so any other changes > leading to touching this PTE will need to lock the page table, so no > parallel change is possible at this time. > > Compared to the Peter's initial work, this series introduces a spin_trylock > when dealing with speculative page fault. This is required to avoid dead > lock when handling a page fault while a TLB invalidate is requested by an > other CPU holding the PTE. Another change due to a lock dependency issue > with mapping->i_mmap_rwsem. > > In addition some VMA field values which are used once the PTE is unlocked > at the end the page fault path are saved into the vm_fault structure to > used the values matching the VMA at the time the PTE was locked. > > This series builds on top of v4.13-rc5 and is functional on x86 and > PowerPC. > > Tests have been made using a large commercial in-memory database on a > PowerPC system with 752 CPU using RFC v5. The results are very encouraging > since the loading of the 2TB database was faster by 14% with the > speculative page fault. > You specifically mention loading as most of the page faults will happen at that time and then the working set will settle down with very less page faults there after ? That means unless there is another wave of page faults we wont notice performance improvement during the runtime. > Using ebizzy test [3], which spreads a lot of threads, the result are good > when running on both a large or a small system. When using kernbench, the The performance improvements are greater as there is a lot of creation and destruction of anon mappings which generates constant flow of page faults to be handled. > result are quite similar which expected as not so much multi threaded > processes are involved. But there is no performance degradation neither > which is good. If we compile with 'make -j N' there would be a lot of threads but I guess the problem is SPF does not support handling file mapping IIUC which limits the performance improvement for some workloads. > > -- > Benchmarks results > > Note these test have been made on top of 4.13-rc3 with the following patch > from Paul McKenney applied: > "srcu: Provide ordering for CPU not involved in grace period" [5] Is this patch an improvement for SRCU which we are using for walking VMAs. > > Ebizzy: > --- > The test is counting the number of records per second it can manage, the > higher is the best. I run it like this 'ebizzy -mTRp'. To get consistent > result I repeated the test 100 times and measure the average result, mean > deviation, max and min. > > - 16 CPUs x86 VM > Records/s 4.13-rc54.13-rc5-spf > Average 11350.2921760.36 > Mean deviation396.56 881.40 > Max 13773
Re: [PATCH 1/2] pwm: tiehrpwm: fix runtime pm imbalance at unbind
On Thu, Jul 20, 2017 at 12:48:16PM +0200, Johan Hovold wrote: > Remove unbalanced RPM put at driver unbind which resulted in a negative > usage count. > > Fixes: 19891b20e7c2 ("pwm: pwm-tiehrpwm: PWM driver support for EHRPWM") > Signed-off-by: Johan Hovold> --- > drivers/pwm/pwm-tiehrpwm.c | 1 - > 1 file changed, 1 deletion(-) Both patches applied to for-4.14/drivers, thanks. Thierry signature.asc Description: PGP signature
Re: [PATCH v14 4/5] mm: support reporting free page blocks
On 08/18/2017 09:46 PM, Michal Hocko wrote: On Thu 17-08-17 11:26:55, Wei Wang wrote: This patch adds support to walk through the free page blocks in the system and report them via a callback function. Some page blocks may leave the free list after zone->lock is released, so it is the caller's responsibility to either detect or prevent the use of such pages. This could see more details to be honest. Especially the usecase you are going to use this for. This will help us to understand the motivation in future when the current user might be gone a new ones largely diverge into a different usage. This wouldn't be the first time I have seen something like that. OK, I will more details here about how it's used to accelerate live migration. Signed-off-by: Wei WangSigned-off-by: Liang Li Cc: Michal Hocko Cc: Michael S. Tsirkin --- include/linux/mm.h | 6 ++ mm/page_alloc.c| 44 2 files changed, 50 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 46b9ac5..cd29b9f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1835,6 +1835,12 @@ extern void free_area_init_node(int nid, unsigned long * zones_size, unsigned long zone_start_pfn, unsigned long *zholes_size); extern void free_initmem(void); +extern void walk_free_mem_block(void *opaque1, + unsigned int min_order, + void (*visit)(void *opaque2, + unsigned long pfn, + unsigned long nr_pages)); + /* * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK) * into the buddy system. The freed pages will be poisoned with pattern diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d00f74..a721a35 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4762,6 +4762,50 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask) show_swap_cache_info(); } +/** + * walk_free_mem_block - Walk through the free page blocks in the system + * @opaque1: the context passed from the caller + * @min_order: the minimum order of free lists to check + * @visit: the callback function given by the caller The original suggestion for using visit was motivated by a visit design pattern but I can see how this can be confusing. Maybe a more explicit name wold be better. What about report_free_range. I'm afraid that name would be too long to fit in nicely. How about simply naming it "report"? + * + * The function is used to walk through the free page blocks in the system, + * and each free page block is reported to the caller via the @visit callback. + * Please note: + * 1) The function is used to report hints of free pages, so the caller should + * not use those reported pages after the callback returns. + * 2) The callback is invoked with the zone->lock being held, so it should not + * block and should finish as soon as possible. I think that the explicit note about zone->lock is not really need. This can change in future and I would even bet that somebody might rely on the lock being held for some purpose and silently get broken with the change. Instead I would much rather see something like the following: " Please note that there are no locking guarantees for the callback Just a little confused with this one: The callback is invoked within zone->lock, why would we claim it "no locking guarantees for the callback"? and that the reported pfn range might be freed or disappear after the callback returns so the caller has to be very careful how it is used. The callback itself must not sleep or perform any operations which would require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC) or via any lock dependency. It is generally advisable to implement the callback as simple as possible and defer any heavy lifting to a different context. There is no guarantee that each free range will be reported only once during one walk_free_mem_block invocation. pfn_to_page on the given range is strongly discouraged and if there is an absolute need for that make sure to contact MM people to discuss potential problems. The function itself might sleep so it cannot be called from atomic contexts. In general low orders tend to be very volatile and so it makes more sense to query larger ones for various optimizations which like ballooning etc... This will reduce the overhead as well. " I think it looks quite comprehensive. Thanks. Best, Wei
[PATCH] drm: mxsfb: constify drm_simple_display_pipe_funcs
drm_simple_display_pipe_funcs are not supposed to change at runtime. All functions working with drm_simple_display_pipe_funcs provided by work with const drm_simple_display_pipe_funcs. So mark the non-const structs as const. Signed-off-by: Arvind Yadav--- drivers/gpu/drm/mxsfb/mxsfb_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/mxsfb/mxsfb_drv.c b/drivers/gpu/drm/mxsfb/mxsfb_drv.c index d1b9c34..13e7ad8 100644 --- a/drivers/gpu/drm/mxsfb/mxsfb_drv.c +++ b/drivers/gpu/drm/mxsfb/mxsfb_drv.c @@ -130,7 +130,7 @@ static int mxsfb_pipe_prepare_fb(struct drm_simple_display_pipe *pipe, return drm_fb_cma_prepare_fb(>plane, plane_state); } -static struct drm_simple_display_pipe_funcs mxsfb_funcs = { +static const struct drm_simple_display_pipe_funcs mxsfb_funcs = { .enable = mxsfb_pipe_enable, .disable= mxsfb_pipe_disable, .update = mxsfb_pipe_update, -- 1.9.1
Re: [PATCH] pwm: Kconfig: Enable pwm-tiecap to be built for Keystone
On Wed, Aug 02, 2017 at 11:43:44AM +0530, Vignesh R wrote: > 66AK2G SoC has ECAP subsystem that is used as pwm-backlight provider for > display. Hence, enable pwm-tiecap driver to be built for Keystone > architecture. > > Signed-off-by: Vignesh R> --- > drivers/pwm/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied to for-4.14/drivers, thanks. Thierry signature.asc Description: PGP signature
Re: [PATCH 1/3] soc: qcom: smem: Support global partition
On 8/18/2017 6:45 AM, Chris Lew wrote: @@ -782,7 +855,10 @@ static int qcom_smem_probe(struct platform_device *pdev) } version = qcom_smem_get_sbl_version(smem); - if (version >> 16 != SMEM_EXPECTED_VERSION) { + switch (version >> 16) { + case SMEM_GLOBAL_PART_VERSION: + case SMEM_GLOBAL_HEAP_VERSION: break statement is needed for supported versions + default: dev_err(>dev, "Unsupported SMEM version 0x%x\n", version); return -EINVAL; }
Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability
Hi Jens, On Wed, Aug 16, 2017 at 09:56:12AM -0600, Jens Axboe wrote: > On 08/15/2017 10:48 PM, Minchan Kim wrote: > > Hi Jens, > > > > On Mon, Aug 14, 2017 at 10:17:09AM -0600, Jens Axboe wrote: > >> On 08/14/2017 09:38 AM, Jens Axboe wrote: > >>> On 08/14/2017 09:31 AM, Minchan Kim wrote: > > Secondly, generally you don't have slow devices and fast devices > > intermingled when running workloads. That's the rare case. > > Not true. zRam is really popular swap for embedded devices where > one of low cost product has a really poor slow nand compared to > lz4/lzo [de]comression. > >>> > >>> I guess that's true for some cases. But as I said earlier, the recycling > >>> really doesn't care about this at all. They can happily coexist, and not > >>> step on each others toes. > >> > >> Dusted it off, result is here against -rc5: > >> > >> http://git.kernel.dk/cgit/linux-block/log/?h=cpu-alloc-cache > >> > >> I'd like to split the amount of units we cache and the amount of units > >> we free, right now they are both CPU_ALLOC_CACHE_SIZE. This means that > >> once we hit that count, we free all of the, and then store the one we > >> were asked to free. That always keeps 1 local, but maybe it'd make more > >> sense to cache just free CPU_ALLOC_CACHE_SIZE/2 (or something like that) > >> so that we retain more than 1 per cpu in case and app preempts when > >> sleeping for IO and the new task on that CPU then issues IO as well. > >> Probably minor. > >> > >> Ran a quick test on nullb0 with 32 sync readers. The test was O_DIRECT > >> on the block device, so I disabled the __blkdev_direct_IO_simple() > >> bypass. With the above branch, we get ~18.0M IOPS, and without we get > >> ~14M IOPS. Both ran with iostats disabled, to avoid any interference > >> from that. > > > > Looks promising. > > If recycling bio works well enough, I think we don't need to introduce > > new split in the path for on-stack bio. > > I will test your version on zram-swap! > > Thanks, let me know how it goes. It's quite possible that we'll need > a few further tweaks, but at least the basis should be there. Sorry for my late reply. I just finished the swap-in testing in with zram-swap which is critical for the latency. For the testing, I made a memcc and put $NR_CPU(mine is 12) processes in there and each processes consumes 1G so total is 12G while my system has 16GB memory so there was no global reclaim. Then, echo 1 > /mnt/memcg/group/force.empty to swap all pages out and then the programs wait my signal to swap in and I trigger the signal to every processes to swap in every pages and measures elapsed time for the swapin. the value is average usec time elapsed swap-in 1G pages for each process and I repeated it 10times and stddev is very stable. swapin: base(with rw_page) 1100806.73(100.00%) no-rw_page 1146856.95(104.18%) Jens's pcp 1146910.00(104.19%) onstack-bio 1114872.18(101.28%) In my test, there is no difference between dynamic bio allocation (i.e., no-rwpage) and pcp approch but onstack-bio is much faster so it's almost same with rw_page. swapout test is to measure elapsed time for "echo 1 > /mnt/memcg/test_group/force.empty' so it's sec unit. swapout: base(with rw_page) 7.72(100.00%) no-rw_page 8.36(108.29%) Jens's pcp 8.31(107.64%) onstack-bio 8.19(106.09%) rw_page's swapout is 6% or more than faster than else. I tried pmbenchmak with no memcg to see the performance in global reclaim. Also, I executed background IO job which reads data from HDD. The value is average usec time elapsed for a page access so smaller is better. base(with rw_page) 14.42(100.00%) no-rw_page 15.66(108.60%) Jens's pcp 15.81(109.64%) onstack-bio 15.42(106.93%) It's similar to swapout test in memcg. 6% or more is not trivial so I doubt we can remove rw_page at this moment. :( I will look into the detail with perf. If you have further optimizations or suggestions, Feel free to say that. I am happy to test it. Thanks.
Re: [PATCH v14 4/5] mm: support reporting free page blocks
On Mon 21-08-17 14:12:47, Wei Wang wrote: > On 08/18/2017 09:46 PM, Michal Hocko wrote: [...] > >>+/** > >>+ * walk_free_mem_block - Walk through the free page blocks in the system > >>+ * @opaque1: the context passed from the caller > >>+ * @min_order: the minimum order of free lists to check > >>+ * @visit: the callback function given by the caller > >The original suggestion for using visit was motivated by a visit design > >pattern but I can see how this can be confusing. Maybe a more explicit > >name wold be better. What about report_free_range. > > > I'm afraid that name would be too long to fit in nicely. > How about simply naming it "report"? I do not have a strong opinion on this. I wouldn't be afraid of using slightly longer name here for the clarity sake, though. > >>+ * > >>+ * The function is used to walk through the free page blocks in the system, > >>+ * and each free page block is reported to the caller via the @visit > >>callback. > >>+ * Please note: > >>+ * 1) The function is used to report hints of free pages, so the caller > >>should > >>+ * not use those reported pages after the callback returns. > >>+ * 2) The callback is invoked with the zone->lock being held, so it should > >>not > >>+ * block and should finish as soon as possible. > >I think that the explicit note about zone->lock is not really need. This > >can change in future and I would even bet that somebody might rely on > >the lock being held for some purpose and silently get broken with the > >change. Instead I would much rather see something like the following: > >" > >Please note that there are no locking guarantees for the callback > > Just a little confused with this one: > > The callback is invoked within zone->lock, why would we claim it "no > locking guarantees for the callback"? Because we definitely do not want anybody to rely on that fact and (ab)use it. This might change in future and it would be better to be clear about that. -- Michal Hocko SUSE Labs
Re: [PATCH 1/4] pwm: pwm-tiecap: Add TI 66AK2G SoC specific compatible
On Mon, Aug 07, 2017 at 05:19:40PM +0530, Vignesh R wrote: > Add a new compatible string "ti,k2g-ecap" to support PWM ECAP IP of > TI 66AK2G SoC. > > Signed-off-by: Vignesh R> --- > Documentation/devicetree/bindings/pwm/pwm-tiecap.txt | 1 + > 1 file changed, 1 insertion(+) Applied to for-4.14/drivers, thanks. Thierry signature.asc Description: PGP signature
[PATCH v2] rcar-dmac: initialize all data before registering IRQ handler
From: Kuninori MorimotoAnton Volkov noticed that engine->dev is NULL before of_dma_controller_register() in probe. Thus there might be a NULL pointer dereference in rcar_dmac_chan_start_xfer while accessing chan->chan.device->dev which is equal to (>engine)->dev. On same reason, same and similar things will happen if we didn't initialize all necessary data before calling register irq function. To be more safety code, this patch initialize all necessary data before calling register irq function. Reported-by: Anton Volkov Signed-off-by: Kuninori Morimoto --- v1 -> v2 - care devm_request_threaded_irq(xxx) on rcar_dmac_chan_probe() - care devm_request_irq(xx) on rcar_dmac_probe() drivers/dma/sh/rcar-dmac.c | 85 +++--- 1 file changed, 43 insertions(+), 42 deletions(-) diff --git a/drivers/dma/sh/rcar-dmac.c b/drivers/dma/sh/rcar-dmac.c index ffcadca..2b2c7db 100644 --- a/drivers/dma/sh/rcar-dmac.c +++ b/drivers/dma/sh/rcar-dmac.c @@ -1690,6 +1690,15 @@ static int rcar_dmac_chan_probe(struct rcar_dmac *dmac, if (!irqname) return -ENOMEM; + /* +* Initialize the DMA engine channel and add it to the DMA engine +* channels list. +*/ + chan->device = >engine; + dma_cookie_init(chan); + + list_add_tail(>device_node, >engine.channels); + ret = devm_request_threaded_irq(dmac->dev, rchan->irq, rcar_dmac_isr_channel, rcar_dmac_isr_channel_thread, 0, @@ -1700,15 +1709,6 @@ static int rcar_dmac_chan_probe(struct rcar_dmac *dmac, return ret; } - /* -* Initialize the DMA engine channel and add it to the DMA engine -* channels list. -*/ - chan->device = >engine; - dma_cookie_init(chan); - - list_add_tail(>device_node, >engine.channels); - return 0; } @@ -1794,14 +1794,6 @@ static int rcar_dmac_probe(struct platform_device *pdev) if (!irqname) return -ENOMEM; - ret = devm_request_irq(>dev, irq, rcar_dmac_isr_error, 0, - irqname, dmac); - if (ret) { - dev_err(>dev, "failed to request IRQ %u (%d)\n", - irq, ret); - return ret; - } - /* Enable runtime PM and initialize the device. */ pm_runtime_enable(>dev); ret = pm_runtime_get_sync(>dev); @@ -1818,8 +1810,32 @@ static int rcar_dmac_probe(struct platform_device *pdev) goto error; } - /* Initialize the channels. */ - INIT_LIST_HEAD(>engine.channels); + /* Initialize engine */ + engine = >engine; + + dma_cap_set(DMA_MEMCPY, engine->cap_mask); + dma_cap_set(DMA_SLAVE, engine->cap_mask); + + engine->dev = >dev; + engine->copy_align = ilog2(RCAR_DMAC_MEMCPY_XFER_SIZE); + + engine->src_addr_widths = widths; + engine->dst_addr_widths = widths; + engine->directions = BIT(DMA_MEM_TO_DEV) | BIT(DMA_DEV_TO_MEM); + engine->residue_granularity = DMA_RESIDUE_GRANULARITY_BURST; + + engine->device_alloc_chan_resources = rcar_dmac_alloc_chan_resources; + engine->device_free_chan_resources = rcar_dmac_free_chan_resources; + engine->device_prep_dma_memcpy = rcar_dmac_prep_dma_memcpy; + engine->device_prep_slave_sg= rcar_dmac_prep_slave_sg; + engine->device_prep_dma_cyclic = rcar_dmac_prep_dma_cyclic; + engine->device_config = rcar_dmac_device_config; + engine->device_terminate_all= rcar_dmac_chan_terminate_all; + engine->device_tx_status= rcar_dmac_tx_status; + engine->device_issue_pending= rcar_dmac_issue_pending; + engine->device_synchronize = rcar_dmac_device_synchronize; + + INIT_LIST_HEAD(>channels); for (i = 0; i < dmac->n_channels; ++i) { ret = rcar_dmac_chan_probe(dmac, >channels[i], @@ -1828,6 +1844,14 @@ static int rcar_dmac_probe(struct platform_device *pdev) goto error; } + ret = devm_request_irq(>dev, irq, rcar_dmac_isr_error, 0, + irqname, dmac); + if (ret) { + dev_err(>dev, "failed to request IRQ %u (%d)\n", + irq, ret); + return ret; + } + /* Register the DMAC as a DMA provider for DT. */ ret = of_dma_controller_register(pdev->dev.of_node, rcar_dmac_of_xlate, NULL); @@ -1839,29 +1863,6 @@ static int rcar_dmac_probe(struct platform_device *pdev) * * Default transfer size of 32 bytes requires 32-byte alignment. */ - engine =
Re: [PATCH v11 2/4] PCI: Factor out pci_bus_wait_crs()
On 8/21/2017 4:23 PM, Bjorn Helgaas wrote: > On Mon, Aug 21, 2017 at 03:37:06PM -0400, Sinan Kaya wrote: >> On 8/21/2017 3:18 PM, Bjorn Helgaas wrote: >> ... >> if (pci_bus_crs_pending(id)) >> return pci_bus_wait_crs(dev->bus, dev->devfn, , 6); >> >>> I think that makes sense. We'd want to check for CRS SV being >>> enabled, e.g., maybe read PCI_EXP_RTCTL_CRSSVE back in >>> pci_enable_crs() and cache it somewhere. Maybe a crs_sv_enabled bit >>> in the root port's pci_dev, and check it with something like what >>> pcie_root_rcb_set() does? >>> >> >> You can observe CRS under the following conditions >> >> 1. root port <-> endpoint >> 2. bridge <-> endpoint >> 3. root port<->bridge >> >> I was relying on the fact that we are reading 0x001 as an indication that >> this device detected CRS. Maybe, this is too indirect. >> >> If we also want to capture the capability, I think the right thing is to >> check the parent capability. >> >> bool pci_bus_crs_vis_supported(struct pci_dev *bridge) >> { >> if (device type(bridge) == root port) >> return read(root_crs_register_reg); >> >> if (device type(bridge) == switch) >> return read(switch_crs_register); > > I don't understand this part. AFAIK, CRS SV is only a feature of root > ports. The capability and enable bits are in the Root Capabilities > and Root Control registers. > No question about it. > It's certainly true that a device below a switch can respond with a > CRS completion, but the switch is not the requester, and my > understanding is that it would not take any action on the completion > other than passing it upstream. > I saw some bridge references in the spec for CRS. I was going to do some research for it. You answered my question. I was curious how this would impact the behavior. "Bridge Configuration Retry Enable – When Set, this bit enables PCI Express to PCI/PCI-X bridges to return Configuration Request Retry Status (CRS) in response to Configuration Requests that target devices below the bridge. Refer to the PCI Express to PCI/PCI-X Bridge Specification, Revision 1.0 for further details." -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH RFC v3 3/9] KVM: remember position in kvm->vcpus array
Signed-off-by: Radim Krčmář--- include/linux/kvm_host.h | 11 +++ virt/kvm/kvm_main.c | 5 - 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6882538eda32..a8ff956616d2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -220,7 +220,8 @@ struct kvm_vcpu { struct preempt_notifier preempt_notifier; #endif int cpu; - int vcpu_id; + int vcpu_id; /* id given by userspace at creation */ + int vcpus_idx; /* index in kvm->vcpus array */ int srcu_idx; int mode; unsigned long requests; @@ -516,13 +517,7 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id) static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu) { - struct kvm_vcpu *tmp; - int idx; - - kvm_for_each_vcpu(idx, tmp, vcpu->kvm) - if (tmp == vcpu) - return idx; - BUG(); + return vcpu->vcpus_idx; } #define kvm_for_each_memslot(memslot, slots) \ diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e17c40d986f3..caf8323f7df7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2498,7 +2498,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) goto unlock_vcpu_destroy; } - BUG_ON(kvm->vcpus[atomic_read(>online_vcpus)]); + vcpu->vcpus_idx = atomic_read(>online_vcpus); + + BUG_ON(kvm->vcpus[vcpu->vcpus_idx]); + /* Now it's all set up, let userspace reach it */ kvm_get_kvm(kvm); -- 2.13.3
[PATCH 4/4] w1-masters: Improve a size determination in four functions
From: Markus ElfringDate: Mon, 21 Aug 2017 21:53:21 +0200 Replace the specification of data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/w1/masters/ds2482.c | 3 ++- drivers/w1/masters/ds2490.c | 2 +- drivers/w1/masters/mxc_w1.c | 3 +-- drivers/w1/masters/w1-gpio.c | 3 +-- 4 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/w1/masters/ds2482.c b/drivers/w1/masters/ds2482.c index d49681cd29af..7c3e25108285 100644 --- a/drivers/w1/masters/ds2482.c +++ b/drivers/w1/masters/ds2482.c @@ -451,7 +451,8 @@ static int ds2482_probe(struct i2c_client *client, I2C_FUNC_SMBUS_BYTE)) return -ENODEV; - if (!(data = kzalloc(sizeof(struct ds2482_data), GFP_KERNEL))) { + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) { err = -ENOMEM; goto exit; } diff --git a/drivers/w1/masters/ds2490.c b/drivers/w1/masters/ds2490.c index c0ee6ca9ce93..1e5b81490ffe 100644 --- a/drivers/w1/masters/ds2490.c +++ b/drivers/w1/masters/ds2490.c @@ -994,5 +994,5 @@ static int ds_probe(struct usb_interface *intf, struct ds_device *dev; int i, err, alt; - dev = kzalloc(sizeof(struct ds_device), GFP_KERNEL); + dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) diff --git a/drivers/w1/masters/mxc_w1.c b/drivers/w1/masters/mxc_w1.c index 74f2e6e6202a..40a34942d07f 100644 --- a/drivers/w1/masters/mxc_w1.c +++ b/drivers/w1/masters/mxc_w1.c @@ -103,6 +103,5 @@ static int mxc_w1_probe(struct platform_device *pdev) unsigned int clkdiv; int err; - mdev = devm_kzalloc(>dev, sizeof(struct mxc_w1_device), - GFP_KERNEL); + mdev = devm_kzalloc(>dev, sizeof(*mdev), GFP_KERNEL); if (!mdev) diff --git a/drivers/w1/masters/w1-gpio.c b/drivers/w1/masters/w1-gpio.c index 6e8b18bf9fb1..a92eb1407f0f 100644 --- a/drivers/w1/masters/w1-gpio.c +++ b/drivers/w1/masters/w1-gpio.c @@ -128,6 +128,5 @@ static int w1_gpio_probe(struct platform_device *pdev) return -ENXIO; } - master = devm_kzalloc(>dev, sizeof(struct w1_bus_master), - GFP_KERNEL); + master = devm_kzalloc(>dev, sizeof(*master), GFP_KERNEL); if (!master) -- 2.14.0
Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 8/21/17 2:00 PM, Daniel Borkmann wrote: On 08/21/2017 10:44 PM, Edward Cree wrote: On 21/08/17 21:27, Daniel Borkmann wrote: On 08/21/2017 08:36 PM, Edward Cree wrote: On 19/08/17 00:37, Alexei Starovoitov wrote: [...] I'm tempted to just rip out env->varlen_map_value_access and always check the whole thing, because honestly I don't know what it was meant to do originally or how it can ever do any useful pruning. While drastic, it does cause your test case to pass. Original intention from 484611357c19 ("bpf: allow access into map value arrays") was that it wouldn't potentially make pruning worse if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need to take reg state's min_value and max_value into account for state checking; this was basically due to min_value / max_value is being adjusted/tracked on every alu/jmp ops for involved regs (e.g. adjust_reg_min_max_vals() and others that mangle them) even if we have the case that no actual dynamic map access is used throughout the program. To give an example on net tree, the bpf_lxc.o prog's section increases from 36,386 to 68,226 when env->varlen_map_value_access is always true, so it does have an effect. Did you do some checks on this on net-next? I tested with the cilium progs and saw no change in insn count. I suspect that for the normal case I already killed this optimisation when I did my unification patch, it was previously about ignoring min/max values on all regs (including scalars), whereas on net-next it only ignores them on map_value pointers; in practice this is useless because we tend to still have the offset scalar sitting in a register somewhere. (Come to think of it, this may have been behind a large chunk of the #insn increase that my patches caused.) Yeah, this would seem plausible. Since we use umax_value in find_good_pkt_pointers() now (to check against MAX_PACKET_OFF and ensure our reg->range is really ok), we can't just stop caring about all min/max values just because we haven't done any variable map accesses. I don't see a way around this. Agree, was thinking the same. If there's not really a regression in terms of complexity, then lets kill the flag. +1 diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2489e67b65f6..908d13b2a2aa 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3582,7 +3582,7 @@ static int do_check(struct bpf_verifier_env *env) init_reg_state(regs); state->parent = NULL; insn_idx = 0; - env->varlen_map_value_access = false; + env->varlen_map_value_access = true; makes _zero_ difference on cilium*.o tests, so let's just kill that workaround.
Re: [PATCH v2] mm/hugetlb.c: make huge_pte_offset() consistent and document behaviour
On 08/21/2017 11:07 AM, Catalin Marinas wrote: > On Fri, Aug 18, 2017 at 02:29:18PM -0700, Mike Kravetz wrote: >> On 08/18/2017 07:54 AM, Punit Agrawal wrote: >>> When walking the page tables to resolve an address that points to >>> !p*d_present() entry, huge_pte_offset() returns inconsistent values >>> depending on the level of page table (PUD or PMD). >>> >>> It returns NULL in the case of a PUD entry while in the case of a PMD >>> entry, it returns a pointer to the page table entry. >>> >>> A similar inconsitency exists when handling swap entries - returns NULL >>> for a PUD entry while a pointer to the pte_t is retured for the PMD entry. >>> >>> Update huge_pte_offset() to make the behaviour consistent - return a >>> pointer to the pte_t for hugepage or swap entries. Only return NULL in >>> instances where we have a p*d_none() entry and the size parameter >>> doesn't match the hugepage size at this level of the page table. >>> >>> Document the behaviour to clarify the expected behaviour of this function. >>> This is to set clear semantics for architecture specific implementations >>> of huge_pte_offset(). >>> >>> Signed-off-by: Punit Agrawal>>> Cc: Catalin Marinas >>> Cc: Naoya Horiguchi >>> Cc: Steve Capper >>> Cc: Will Deacon >>> Cc: Kirill A. Shutemov >>> Cc: Michal Hocko >>> Cc: Mike Kravetz >>> --- >>> >>> Hi Andrew, >>> >>> From discussions on the arm64 implementation of huge_pte_offset()[0] >>> we realised that there is benefit from returning a pte_t* in the case >>> of p*d_none(). >>> >>> The fault handling code in hugetlb_fault() can handle p*d_none() >>> entries and saves an extra round trip to huge_pte_alloc(). Other >>> callers of huge_pte_offset() should be ok as well. >> >> Yes, this change would eliminate that call to huge_pte_alloc() in >> hugetlb_fault(). However, huge_pte_offset() is now returning a pointer >> to a p*d_none() pte in some instances where it would have previously >> returned NULL. Correct? > > Yes (whether it was previously the right thing to return is a different > matter; that's what we are trying to clarify in the generic code so that > we can have similar semantics on arm64). > >> I went through the callers, and like you am fairly confident that they >> can handle this situation. But, returning p*d_none() instead of NULL >> does change the execution path in several routines such as >> copy_hugetlb_page_range, __unmap_hugepage_range hugetlb_change_protection, >> and follow_hugetlb_page. If huge_pte_alloc() returns NULL to these >> routines, they do a quick continue, exit, etc. If they are returned >> a pointer, they typically lock the page table(s) and then check for >> p*d_none() before continuing, exiting, etc. So, it appears that these >> routines could potentially slow down a bit with this change (in the specific >> case of p*d_none). > > Arguably (well, my interpretation), it should return a NULL only if the > entry is a table entry, potentially pointing to a next level (pmd). In > the pud case, this means that sz < PUD_SIZE. > > If the pud is a last level huge page entry (either present or !present), > huge_pte_offset() should return the pointer to it and never NULL. If the > entry is a swap or migration one (pte_present() == false) with the > current code we don't even enter the corresponding checks in > copy_hugetlb_page_range(). > > I also assume that the ptl __unmap_hugepage_range() is taken to avoid > some race when the entry is a huge page (present or not). If such race > doesn't exist, we could as well check the huge_pte_none() outside the > locked region (which is what the current huge_pte_offset() does with > !pud_present()). > > IMHO, while the current generic huge_pte_offset() avoids some code paths > in the functions you mentioned, the results are not always correct > (missing swap/migration entries or potentially racy). Thanks Catalin, The more I look at this code and think about it, the more I like it. As Michal previously mentioned, changes in this area can break things in subtle ways. That is why I was cautious and asked for more people to look at it. My primary concerns with these changes in this area were: - Any potential changes in behavior. I think this has been sufficiently explored. While there may be small differences in behavior (for the better), this change should not introduce any bugs/breakage. - Other arch specific implementations are not aligned with the new behavior. Again, this should not cause any issues. Punit (and I) have looked at the arch specific implementations for issues and found none. In addition, since we are not changing any of the 'calling code', no issues should be introduced for arch specific implementations. I like the new semantics and did not find any issues. Reviewed-by: Mike
Re: [PATCH v4 6/9] ASoC: rockchip: Parse dai links from dts
El Fri, Aug 18, 2017 at 11:11:44AM +0800 Jeffy Chen ha dit: > Refactor rockchip_sound_probe, parse dai links from dts instead of > hard coding them. > > Signed-off-by: Jeffy Chen> --- > > Changes in v4: None > Changes in v3: > Use compatible to match audio codecs > -- Suggested-by Matthias Kaehlcke > > Changes in v2: > Let rockchip,codec-names be a required property, because we plan to > add more supported codecs to the fixed dai link list in the driver. > > sound/soc/rockchip/rk3399_gru_sound.c | 139 > ++ > 1 file changed, 91 insertions(+), 48 deletions(-) Looks good to me, though I'm by no means an audio expert. On kevin the codecs are enumerated at boot. FWIW: Reviewed-by: Matthias Kaehlcke Tested-by: Matthias Kaehlcke > diff --git a/sound/soc/rockchip/rk3399_gru_sound.c > b/sound/soc/rockchip/rk3399_gru_sound.c > index 9b7e28703bfb..d532336871d7 100644 > --- a/sound/soc/rockchip/rk3399_gru_sound.c > +++ b/sound/soc/rockchip/rk3399_gru_sound.c > @@ -235,14 +235,42 @@ static const struct snd_soc_ops > rockchip_sound_da7219_ops = { > .hw_params = rockchip_sound_da7219_hw_params, > }; > > +static struct snd_soc_card rockchip_sound_card = { > + .name = "rk3399-gru-sound", > + .owner = THIS_MODULE, > + .dapm_widgets = rockchip_dapm_widgets, > + .num_dapm_widgets = ARRAY_SIZE(rockchip_dapm_widgets), > + .dapm_routes = rockchip_dapm_routes, > + .num_dapm_routes = ARRAY_SIZE(rockchip_dapm_routes), > + .controls = rockchip_controls, > + .num_controls = ARRAY_SIZE(rockchip_controls), > +}; > + > enum { > + DAILINK_DA7219, > DAILINK_MAX98357A, > DAILINK_RT5514, > - DAILINK_DA7219, > DAILINK_RT5514_DSP, > }; > > -static struct snd_soc_dai_link rockchip_dailinks[] = { > +static const char * const dailink_compat[] = { > + [DAILINK_DA7219] = "dlg,da7219", > + [DAILINK_MAX98357A] = "maxim,max98357a", > + [DAILINK_RT5514] = "realtek,rt5514-i2c", > + [DAILINK_RT5514_DSP] = "realtek,rt5514-spi", > +}; > + > +static const struct snd_soc_dai_link rockchip_dais[] = { > + [DAILINK_DA7219] = { > + .name = "DA7219", > + .stream_name = "DA7219 PCM", > + .codec_dai_name = "da7219-hifi", > + .init = rockchip_sound_da7219_init, > + .ops = _sound_da7219_ops, > + /* set da7219 as slave */ > + .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | > + SND_SOC_DAIFMT_CBS_CFS, > + }, > [DAILINK_MAX98357A] = { > .name = "MAX98357A", > .stream_name = "MAX98357A PCM", > @@ -261,16 +289,6 @@ static struct snd_soc_dai_link rockchip_dailinks[] = { > .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | > SND_SOC_DAIFMT_CBS_CFS, > }, > - [DAILINK_DA7219] = { > - .name = "DA7219", > - .stream_name = "DA7219 PCM", > - .codec_dai_name = "da7219-hifi", > - .init = rockchip_sound_da7219_init, > - .ops = _sound_da7219_ops, > - /* set da7219 as slave */ > - .dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF | > - SND_SOC_DAIFMT_CBS_CFS, > - }, > /* RT5514 DSP for voice wakeup via spi bus */ > [DAILINK_RT5514_DSP] = { > .name = "RT5514 DSP", > @@ -279,53 +297,78 @@ static struct snd_soc_dai_link rockchip_dailinks[] = { > }, > }; > > -static struct snd_soc_card rockchip_sound_card = { > - .name = "rk3399-gru-sound", > - .owner = THIS_MODULE, > - .dai_link = rockchip_dailinks, > - .num_links = ARRAY_SIZE(rockchip_dailinks), > - .dapm_widgets = rockchip_dapm_widgets, > - .num_dapm_widgets = ARRAY_SIZE(rockchip_dapm_widgets), > - .dapm_routes = rockchip_dapm_routes, > - .num_dapm_routes = ARRAY_SIZE(rockchip_dapm_routes), > - .controls = rockchip_controls, > - .num_controls = ARRAY_SIZE(rockchip_controls), > -}; > - > -static int rockchip_sound_probe(struct platform_device *pdev) > +static int rockchip_sound_codec_node_match(struct device_node *np_codec) > { > - struct snd_soc_card *card = _sound_card; > - struct device_node *cpu_node; > - int i, ret; > + int i; > > - cpu_node = of_parse_phandle(pdev->dev.of_node, "rockchip,cpu", 0); > - if (!cpu_node) { > - dev_err(>dev, "Property 'rockchip,cpu' missing or > invalid\n"); > - return -EINVAL; > + for (i = 0; i < ARRAY_SIZE(dailink_compat); i++) { > + if (of_device_is_compatible(np_codec, dailink_compat[i])) > + return i; > } > + return -1; > +} > > - for (i = 0; i < ARRAY_SIZE(rockchip_dailinks); i++) { > - rockchip_dailinks[i].platform_of_node = cpu_node; > -
[PATCH net-next,2/4] hv_netvsc: Clean up unused parameter from netvsc_get_rss_hash_opts()
From: Haiyang ZhangThe parameter "nvdev" is not in use. Signed-off-by: Haiyang Zhang --- drivers/net/hyperv/netvsc_drv.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 4677d21..d8612b1 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1228,8 +1228,7 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data) } static int -netvsc_get_rss_hash_opts(struct netvsc_device *nvdev, -struct ethtool_rxnfc *info) +netvsc_get_rss_hash_opts(struct ethtool_rxnfc *info) { info->data = RXH_IP_SRC | RXH_IP_DST; @@ -1267,7 +1266,7 @@ static void netvsc_get_strings(struct net_device *dev, u32 stringset, u8 *data) return 0; case ETHTOOL_GRXFH: - return netvsc_get_rss_hash_opts(nvdev, info); + return netvsc_get_rss_hash_opts(info); } return -EOPNOTSUPP; } -- 1.7.1
[PATCH net-next,1/4] hv_netvsc: Clean up unused parameter from netvsc_get_hash()
From: Haiyang ZhangThe parameter "sk" is not in use. Signed-off-by: Haiyang Zhang --- drivers/net/hyperv/netvsc_drv.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index b33f050..4677d21 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -193,7 +193,7 @@ static int netvsc_close(struct net_device *net) /* Azure hosts don't support non-TCP port numbers in hashing yet. We compute * hash for non-TCP traffic with only IP numbers. */ -static inline u32 netvsc_get_hash(struct sk_buff *skb, struct sock *sk) +static inline u32 netvsc_get_hash(struct sk_buff *skb) { struct flow_keys flow; u32 hash; @@ -227,7 +227,7 @@ static inline int netvsc_get_tx_queue(struct net_device *ndev, struct sock *sk = skb->sk; int q_idx; - q_idx = ndc->tx_send_table[netvsc_get_hash(skb, sk) & + q_idx = ndc->tx_send_table[netvsc_get_hash(skb) & (VRSS_SEND_TAB_SIZE - 1)]; /* If queue index changed record the new value */ -- 1.7.1
Re: [PATCH v2] PM / AVS: rockchip-io: add io selectors and supplies for RV1108
Am Montag, 21. August 2017, 18:58:33 CEST schrieb David Wu: > This adds the necessary data for handling io voltage domains on the RV1108. > > Signed-off-by: David WuReviewed-by: Heiko Stuebner
Re: [PATCH v3 1/5] ACPI / blacklist: add acpi_match_platform_list()
On Tue, Aug 22, 2017 at 12:21 AM, Kani, Toshimitsuwrote: > On Mon, 2017-08-21 at 23:49 +0200, Rafael J. Wysocki wrote: >> On Mon, Aug 21, 2017 at 11:06 PM, Kani, Toshimitsu > m> wrote: >> > On Mon, 2017-08-21 at 22:31 +0200, Rafael J. Wysocki wrote: >> > > On Mon, Aug 21, 2017 at 7:36 PM, Borislav Petkov >> > > wrote: >> > > > On Mon, Aug 21, 2017 at 05:23:37PM +, Kani, Toshimitsu >> > > > wrote: >> > > > > > > 'data' here is private to the caller. So, I do not think >> > > > > > > we need to define the bits. Shall I change the name to >> > > > > > > 'driver_data' to make it more explicit? >> > > > > > >> > > > > > You changed it to 'data'. It was a u32-used-as-boolean >> > > > > > is_critical_error before. >> > > > > > >> > > > > > So you can just as well make it into flags and people can >> > > > > > extend those flags if needed. A flag bit should be enough >> > > > > > in most cases anyway. If they really need driver_data, then >> > > > > > they can add a void *member. >> > > > > >> > > > > Hmm.. In patch 2, intel_pstate_platform_pwr_mgmt_exists() >> > > > > uses this field for PSS and PCC, which are enum values. I >> > > > > think we should allow drivers to set any values here. I >> > > > > agree that it may need to be void * if we also allow drivers >> > > > > to set a pointer here. >> > > > >> > > > Let's see what Rafael prefers. >> > > >> > > I would retain the is_critical_error field and use that for >> > > printing the recoverable / non-recoverable message. This is kind >> > > of orthogonal to whether or not any extra data is needed and that >> > > can be an additional field. In that case unsigned long should be >> > > sufficient to accommodate a pointer if need be. >> > >> > Yes, we will retain the field. The question is whether this field >> > should be retained as a driver's private data or ACPI-managed >> > flags. >> >> Thanks for the clarification. >> >> > My patch implements the former, which lets the callers to define >> > the data values. For instance, acpi_blacklisted() uses this field >> > as is_critical_error value, and >> > intel_pstate_platform_pwr_mgmt_exists() uses it as oem_pwr_table >> > value. >> > >> > Boris suggested the latter, which lets ACPI to define the flags, >> > which are then used by the callers. For instance, he suggested >> > ACPI to define bit0 as is_critical_error. >> > >> > #define ACPI_PLAT_IS_CRITICAL_ERROR BIT(0) >> >> So my point is that we can have both the ACPI-managed flags and the >> the caller-defined data at the same time as separate items. >> >> That would allow of maximum flexibility IMO. > > I agree in general. Driver private data allows flexibility to drivers > when the values are driver-private. ACPI-managed flags allows ACPI to > control the interfaces based on the flags. > > Since we do not have use-case of the latter case yet, i.e. > acpi_match_platform_list() does not need to check the flags, I'd > suggest that we keep 'data' as driver-private. We can add 'flags' as a > separate member to the structure when we find the latter use-case. OK Thanks, Rafael
Re: [PATCH v1 1/2] clk: rockchip: add rk3228 sclk_sdio_src ID
Am Freitag, 18. August 2017, 11:49:24 CEST schrieb Elaine Zhang: > This patch exports sdio src clock for dts reference. > > Signed-off-by: Elaine Zhangapplied for 4.14 Thanks Heiko
Re: [PATCH v2] perf tools: Add ARM Statistical Profiling Extensions (SPE) support
On Fri, 18 Aug 2017 18:36:09 +0100 Mark Rutlandwrote: > Hi Kim, Hi Mark, > On Thu, Aug 17, 2017 at 10:11:50PM -0500, Kim Phillips wrote: > > Hi Mark, I've tried to proceed as much as possible without your > > response, so if you still have comments to my above comments, please > > comment in-line above, otherwise review the v2 patch below? > > Apologies again for the late response, and thanks for the updated patch! Thanks for your prompt response this time around. > > . > > . ... ARM SPE data: size 536432 bytes > > . : 4a 01 B COND > > . 0002: b1 00 00 00 00 00 00 00 80 TGT 0 el0 ns=1 > > . 000b: 42 42 RETIRED > > NOT-TAKEN > > . 000d: b0 20 41 c0 ad ff ff 00 80 PC > > adc04120 el0 ns=1 > > . 0016: 98 00 00LAT 0 TOT > > . 0019: 71 80 3e f7 46 e9 01 00 00 TS > > 2101429616256 > > . 0022: 49 01 ST > > . 0024: b2 50 bd ba 73 00 80 ff ff VA > > 800073babd50 > > . 002d: b3 50 bd ba f3 00 00 00 80 PA f3babd50 > > ns=1 > > . 0036: 9a 00 00LAT 0 XLAT > > . 0039: 42 16 RETIRED > > L1D-ACCESS TLB-ACCESS > > . 003b: b0 8c b4 1e 08 00 00 ff ff PC > > ff081eb48c el3 ns=1 > > . 0044: 98 00 00LAT 0 TOT > > . 0047: 71 cc 44 f7 46 e9 01 00 00 TS > > 2101429617868 > > . 0050: 48 00 INSN-OTHER > > . 0052: 42 02 RETIRED > > . 0054: b0 58 54 1f 08 00 00 ff ff PC > > ff081f5458 el3 ns=1 > > . 005d: 98 00 00LAT 0 TOT > > . 0060: 71 cc 44 f7 46 e9 01 00 00 TS > > 2101429617868 > > So FWIW, I think this is a good example of why that padding I requested > last time round matters. > > For the first PC packet, I had to count the number of characters to see > that it was a TTBR0 address, which is made much clearer with leading > padding, as adc04120. With the addresses padded, the EL and NS > fields would also be aligned, making it *much* easier to scan by eye. See my response in my prior email. > > - multiple SPE clusters/domains support pending potential driver changes? > > As covered in my other reply, I don't believe that the driver is going > to change in this regard. Userspace will need to handle multiple SPE > instances. > > I'll ignore that in the code below for now. Please let's continue the discussion in one place, and again in this case, in the last email. > > - CPU mask / new record behaviour bisected to commit e3ba76deef23064 "perf > > tools: Force uncore events to system wide monitoring". Waiting to hear > > back > > on why driver can't do system wide monitoring, even across PPIs, by e.g., > > sharing the SPE interrupts in one handler (SPE's don't differ in this > > record > > regard). > > Could you elaborate on this? I don't follow the interrupt handler > comments. Would it be possible for the driver to request the IRQs with IRQF_SHARED, in order to be able to operate across the multiple PPIs? > > +static u64 arm_spe_reference(struct auxtrace_record *itr __maybe_unused) > > +{ > > + u64 ts; > > + > > + asm volatile ("isb; mrs %0, cntvct_el0" : "=r" (ts)); > > + > > + return ts; > > +} > > As covered in my other reply, please don't use the counter for this. > > It sounds like we need a simple/generic function to get a nonce, that > we could share with the ETM code. I've switched to using clock_gettime(CLOCK_MONOTONIC_RAW, ...). The ETM code uses two rand() calls, which, according to some minor benchmarking on Juno, is almost twice as slow as clock_gettime. It's three lines still, so I'll update the ETM code in-place independently of this patch, and after the gettime implementation is reviewed. > > +int arm_spe_get_packet(const unsigned char *buf, size_t len, > > + struct arm_spe_pkt *packet) > > +{ > > + int ret; > > + > > + ret = arm_spe_do_get_packet(buf, len, packet); > > + if (ret > 0 && packet->type == ARM_SPE_PAD) { > > + while (ret < 16 && len > (size_t)ret && !buf[ret]) > > + ret += 1; > > + } > > + return ret; > > +} > > What's this doing? Skipping padding? What's the significance of 16? I'll repeat the relevant part of the v2 changelog here: - do_get_packet fixed to handle excessive, successive PADding from a new source of raw SPE data, so instead of: . 11ae: 00 PAD . 11af: 00
Re: [PATCH V2] spmi: pmic-arb: Enforce the ownership check optionally
On 08/18/2017 08:28 AM, Kiran Gunda wrote: > The peripheral ownership check is not necessary on single master > platforms. Hence, enforce the peripheral ownership check optionally. > > Signed-off-by: Kiran Gunda> Tested-by: Shawn Guo > --- This sounds like a band-aid. Isn't the gpio driver going to keep probing all the pins that are not supposed to be accessed due to security constraints? What exactly is failing in the gpio case? Also, I thought we were getting rid of the ownership checks? Or at least, putting them behind some debug kernel feature check or something? -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v5 00/16] Add QCOM QPIC NAND support
Le Thu, 17 Aug 2017 17:37:38 +0530, Abhishek Sahua écrit : > * v5: > > 1. Removed the patches already applied to linux-next and rebased the >remaining patches on [3] > 2. Addressed the review comments in v4 and Added Archit Reviewed >by tag. > > [3] http://git.infradead.org/l2-mtd.git/shortlog/refs/heads/nand/next > > * v4: > > 1. Added Acked-by from Rob for DT documentation patches > 2. Removed ipq8074 compatible string from ipq4019 DT example. > 2. Used the BIT macro for NAND_CMD_VLD bits and consistent names >as suggested by Boris in v3. > > * v3: > > 1. Removed the patches already applied to linux-next and >rebased the remaining patches on [1] > 2. Reordered the patches and put the BAM DMA changes [2] >dependent patches and compatible string patches in last > 3. Removed the register offsets array and used the dev_cmd offsets > 4. Changed some macro names to small letters for better code readability > 5. Changed the compatible string to SoC specific > 6. Did minor code changes for adding comment, error handling, structure names > 7. Combined raw write (patch#18) and passing flag parameter (patch#22) patch >into one > 8. Made separate patch for compatible string and NAND properties > 9. Made separate patch for BAM command descriptor and data descriptors > handling > 10. Changed commit message for some of the patches > 11. Addressed review comments given in v2 > 12. Added Reviewed-by of Archit for some of the patches from v2 > 13. All the MTD tests are working fine for IPQ8064 AP148, IPQ4019 DK04 and > IPQ8074 HK01 boards for v3 patches > > [1] http://git.infradead.org/l2-mtd.git/shortlog/refs/heads/nand/next > [2] http://www.spinics.net/lists/dmaengine/msg13662.html > > * v2: > > 1. Addressed the review comments given in v1 > 2. Removed the DMA coherent buffer for register read and used >streaming DMA API’s > 3. Reorganized the NAND read and write functions > 4. Separated patch for driver and documentation changes > 5. Changed the compatible string for EBI2 > > * v1: > > http://www.spinics.net/lists/devicetree/msg183706.html > > > Abhishek Sahu (16): > mtd: nand: qcom: DMA mapping support for register read buffer > mtd: nand: qcom: allocate BAM transaction > mtd: nand: qcom: add BAM DMA descriptor handling > mtd: nand: qcom: support for passing flags in DMA helper functions > mtd: nand: qcom: support for read location registers > mtd: nand: qcom: erased codeword detection configuration > mtd: nand: qcom: enable BAM or ADM mode > mtd: nand: qcom: QPIC data descriptors handling > mtd: nand: qcom: support for different DEV_CMD register offsets > mtd: nand: qcom: add command elements in BAM transaction > mtd: nand: qcom: support for command descriptor formation > dt-bindings: qcom_nandc: fix the ipq806x device tree example > dt-bindings: qcom_nandc: IPQ4019 QPIC NAND documentation > dt-bindings: qcom_nandc: IPQ8074 QPIC NAND documentation > mtd: nand: qcom: support for IPQ4019 QPIC NAND controller > mtd: nand: qcom: Support for IPQ8074 QPIC NAND controller Applied everything to nand/next except patch 10 and 11. Let me know when the dmaengine dependency is merged and I'll take the remaining patches. Thanks, Boris > > .../devicetree/bindings/mtd/qcom_nandc.txt | 63 +- > drivers/mtd/nand/qcom_nandc.c | 746 > ++--- > 2 files changed, 722 insertions(+), 87 deletions(-) >
Re: [PATCH] PM / Hibernate: Feed the wathdog when creating snapshot
On Mon, 21 Aug 2017 23:08:18 +0800 Chen Yuwrote: > There is a problem that when counting the pages for creating > the hibernation snapshot will take significant amount of > time, especially on system with large memory. Since the counting > job is performed with irq disabled, this might lead to NMI lockup. > The following warning were found on a system with 1.5TB DRAM: > > ... > > It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup > was triggered. In case the timeout of the NMI watch dog has been > set to 1 second, a safe interval should be 6590003/20 = 320k pages > in theory. However there might also be some platforms running at a > lower frequency, so feed the watchdog every 100k pages. > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2531,9 +2532,12 @@ void drain_all_pages(struct zone *zone) > > #ifdef CONFIG_HIBERNATION > > +/* Touch watchdog for every WD_INTERVAL_PAGE pages. */ > +#define WD_INTERVAL_PAGE (100*1024) > + > void mark_free_pages(struct zone *zone) > { > - unsigned long pfn, max_zone_pfn; > + unsigned long pfn, max_zone_pfn, page_num = 0; > unsigned long flags; > unsigned int order, t; > struct page *page; > @@ -2548,6 +2552,9 @@ void mark_free_pages(struct zone *zone) > if (pfn_valid(pfn)) { > page = pfn_to_page(pfn); > > + if (!((page_num++) % WD_INTERVAL_PAGE)) > + touch_nmi_watchdog(); > + > if (page_zone(page) != zone) > continue; > > @@ -2561,8 +2568,11 @@ void mark_free_pages(struct zone *zone) > unsigned long i; > > pfn = page_to_pfn(page); > - for (i = 0; i < (1UL << order); i++) > + for (i = 0; i < (1UL << order); i++) { > + if (!((page_num++) % WD_INTERVAL_PAGE)) > + touch_nmi_watchdog(); > swsusp_set_page_free(pfn_to_page(pfn + i)); > + } > } > } > spin_unlock_irqrestore(>lock, flags); hm, is it really worth all the WD_INTERVAL_PAGE stuff? touch_nmi_watchdog() is pretty efficient and calling it once-per-page may not have a measurable effect. And if we're really concerned about the performance impact it would be better to make WD_INTERVAL_PAGE a power of 2 (128*1024?) to avoid the modulus operation.
[PATCH RFC v3 0/9] KVM: allow dynamic kvm->vcpus array
The only common part with v2 is [v3 5/9]. The crucial part of this series is adding a separate mechanism for kvm_for_each_vcpu() [v3 8/9] and with that change, I think that the dynamic array [v3 9/9] would be nicer if protected by RCU, like in v2: The protection can be nicely hidden in kvm_get_vcpu(). I just had the split done before implementing [v3 8/9] and presented it for consideration. Smoke tested on x86 only. Radim Krčmář (9): KVM: s390: optimize detection of started vcpus KVM: arm/arm64: fix vcpu self-detection in vgic_v3_dispatch_sgi() KVM: remember position in kvm->vcpus array KVM: arm/arm64: use locking helpers in kvm_vgic_create() KVM: remove unused __KVM_HAVE_ARCH_VM_ALLOC KVM: rework kvm_vcpu_on_spin loop KVM: add kvm_free_vcpus and kvm_arch_free_vcpus KVM: implement kvm_for_each_vcpu with a list KVM: split kvm->vcpus into chunks arch/mips/kvm/mips.c| 19 ++ arch/powerpc/kvm/book3s_32_mmu.c| 3 +- arch/powerpc/kvm/book3s_64_mmu.c| 3 +- arch/powerpc/kvm/book3s_hv.c| 7 +- arch/powerpc/kvm/book3s_pr.c| 5 +- arch/powerpc/kvm/book3s_xics.c | 2 +- arch/powerpc/kvm/book3s_xics.h | 3 +- arch/powerpc/kvm/book3s_xive.c | 18 +++--- arch/powerpc/kvm/book3s_xive.h | 3 +- arch/powerpc/kvm/e500_emulate.c | 3 +- arch/powerpc/kvm/powerpc.c | 16 ++--- arch/s390/include/asm/kvm_host.h| 1 + arch/s390/kvm/interrupt.c | 3 +- arch/s390/kvm/kvm-s390.c| 77 -- arch/s390/kvm/kvm-s390.h| 6 +- arch/s390/kvm/sigp.c| 3 +- arch/x86/kvm/hyperv.c | 3 +- arch/x86/kvm/i8254.c| 3 +- arch/x86/kvm/i8259.c| 7 +- arch/x86/kvm/ioapic.c | 3 +- arch/x86/kvm/irq_comm.c | 10 +-- arch/x86/kvm/lapic.c| 5 +- arch/x86/kvm/svm.c | 3 +- arch/x86/kvm/vmx.c | 5 +- arch/x86/kvm/x86.c | 34 -- include/linux/kvm_host.h| 81 --- virt/kvm/arm/arch_timer.c | 10 ++- virt/kvm/arm/arm.c | 25 virt/kvm/arm/pmu.c | 3 +- virt/kvm/arm/psci.c | 7 +- virt/kvm/arm/vgic/vgic-init.c | 31 - virt/kvm/arm/vgic/vgic-kvm-device.c | 30 + virt/kvm/arm/vgic/vgic-mmio-v2.c| 5 +- virt/kvm/arm/vgic/vgic-mmio-v3.c| 22 --- virt/kvm/arm/vgic/vgic.c| 3 +- virt/kvm/kvm_main.c | 124 +++- 36 files changed, 278 insertions(+), 308 deletions(-) -- 2.13.3
[PATCH RFC v3 2/9] KVM: arm/arm64: fix vcpu self-detection in vgic_v3_dispatch_sgi()
The index in kvm->vcpus array and vcpu->vcpu_id are very different things. Comparing struct kvm_vcpu pointers is a sure way to know. Signed-off-by: Radim Krčmář--- virt/kvm/arm/vgic/vgic-mmio-v3.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c index 408ef06638fc..9d4b69b766ec 100644 --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c @@ -797,7 +797,6 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg) u16 target_cpus; u64 mpidr; int sgi, c; - int vcpu_id = vcpu->vcpu_id; bool broadcast; sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT; @@ -821,7 +820,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg) break; /* Don't signal the calling VCPU */ - if (broadcast && c == vcpu_id) + if (broadcast && c_vcpu == vcpu) continue; if (!broadcast) { -- 2.13.3
[GIT] Sparc
Just a couple small fixes, two of which have to do with gcc-7: 1) Don't clobber kernel fixed registers in __multi4 libgcc helper. 2) Fix a new uninitialized variable warning on sparc32 with gcc-7, from Thomas Petazzoni. 3) Adjust pmd_t initializer on sparc32 to make gcc happy. 4) If ATU isn't avoid, don't bark in the logs. From Tushar Dave. Please pull, thanks a lot. The following changes since commit 26273939ace935dd7553b31d279eab30b40f7b9a: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-08-10 10:30:29 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc for you to fetch changes up to 2dc77533f1e495788d73ffa4bee4323b2646d2bb: sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() (2017-08-21 13:57:22 -0700) David S. Miller (1): sparc64: Don't clibber fixed registers in __multi4. Thomas Petazzoni (1): sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() Tushar Dave (1): sparc64: remove unnecessary log message Zi Yan (1): mm: add pmd_t initializer __pmd() to work around a GCC bug. arch/sparc/include/asm/page_32.h | 2 ++ arch/sparc/kernel/pci_sun4v.c| 2 -- arch/sparc/kernel/pcic.c | 2 +- arch/sparc/lib/multi3.S | 24 4 files changed, 15 insertions(+), 15 deletions(-)
Re: [PATCH V3 2/4] ARM: dts: rockchip: rk322x add iommu nodes
Am Montag, 24. Juli 2017, 10:32:08 CEST schrieb Simon Xue: > Add VPU/VDEC/VOP/IEP iommu nodes > > Signed-off-by: Simon Xueapplied for 4.14 Thanks Heiko
Re: [patch] fs, proc: unconditional cond_resched when reading smaps
On Mon, Aug 21, 2017 at 02:06:45PM -0700, David Rientjes wrote: > If there are large numbers of hugepages to iterate while reading > /proc/pid/smaps, the page walk never does cond_resched(). On archs > without split pmd locks, there can be significant and observable > contention on mm->page_table_lock which cause lengthy delays without > rescheduling. > > Always reschedule in smaps_pte_range() if necessary since the pagewalk > iteration can be expensive. > > Signed-off-by: David Rientjes> --- > fs/proc/task_mmu.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -599,11 +599,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long > addr, unsigned long end, > if (ptl) { > smaps_pmd_entry(pmd, addr, walk); > spin_unlock(ptl); > - return 0; > + goto out; > } > > if (pmd_trans_unstable(pmd)) > - return 0; > + goto out; > /* >* The mmap_sem held all the way back in m_start() is what >* keeps khugepaged out of here and from collapsing things > @@ -613,6 +613,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long > addr, unsigned long end, > for (; addr != end; pte++, addr += PAGE_SIZE) > smaps_pte_entry(pte, addr, walk); > pte_unmap_unlock(pte - 1, ptl); > +out: > cond_resched(); > return 0; > } Maybe just call cond_resched() at the beginning of the function and don't bother with gotos? -- Kirill A. Shutemov
Re: [PATCH] mt7601u: check memory allocation failure
On Mon, 21 Aug 2017 14:34:30 -0700, Jakub Kicinski wrote: > On Mon, 21 Aug 2017 22:59:56 +0200, Christophe JAILLET wrote: > > Check memory allocation failure and return -ENOMEM in such a case, as > > already done a few lines below > > > > Signed-off-by: Christophe JAILLET> > Acked-by: Jakub Kicinski Wait, I take that back. This code is a bit weird. We would return an error, then mt7601u_dma_init() will call mt7601u_free_tx_queue() which doesn't check for tx_q == NULL condition. Looks like mt7601u_free_tx() has to check for dev->tx_q == NULL and return early if that's the case. Or mt7601u_alloc_tx() should really clean things up on it's own on failure. Ugh.
Re: [PATCH net-next 03/11] net: dsa: debugfs: add tree
On 08/14/2017 03:22 PM, Vivien Didelot wrote: > This commit adds the boiler plate to create a DSA related debug > filesystem entry as well as a "tree" file, containing the tree index. > > # cat switch1/tree > 0 > > Signed-off-by: Vivien DidelotReviewed-by: Florian Fainelli -- Florian
[PATCH v2] mt7601u: check memory allocation failure
Check memory allocation failure and return -ENOMEM in such a case, as already done a few lines below. As 'dev->tx_q' can be NULL, we also need to check for that in 'mt7601u_free_tx()', and return early. Signed-off-by: Christophe JAILLET--- v2: avoid another NULL pointer dereference in 'mt7601u_free_tx()' if the allocation had failed. --- drivers/net/wireless/mediatek/mt7601u/dma.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt7601u/dma.c b/drivers/net/wireless/mediatek/mt7601u/dma.c index 660267b359e4..7f3e3983b781 100644 --- a/drivers/net/wireless/mediatek/mt7601u/dma.c +++ b/drivers/net/wireless/mediatek/mt7601u/dma.c @@ -457,6 +457,9 @@ static void mt7601u_free_tx(struct mt7601u_dev *dev) { int i; + if (!dev->tx_q) + return; + for (i = 0; i < __MT_EP_OUT_MAX; i++) mt7601u_free_tx_queue(>tx_q[i]); } @@ -484,6 +487,8 @@ static int mt7601u_alloc_tx(struct mt7601u_dev *dev) dev->tx_q = devm_kcalloc(dev->dev, __MT_EP_OUT_MAX, sizeof(*dev->tx_q), GFP_KERNEL); + if (!dev->tx_q) + return -ENOMEM; for (i = 0; i < __MT_EP_OUT_MAX; i++) if (mt7601u_alloc_tx_queue(dev, >tx_q[i])) -- 2.11.0
[PATCH] Staging: greybus: Make string array const
Added const to string array. Signed-off-by: Eames Trinh--- drivers/staging/greybus/audio_manager_module.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/greybus/audio_manager_module.c b/drivers/staging/greybus/audio_manager_module.c index adc16977452d..73a3e2decb3a 100644 --- a/drivers/staging/greybus/audio_manager_module.c +++ b/drivers/staging/greybus/audio_manager_module.c @@ -159,7 +159,7 @@ static void send_add_uevent(struct gb_audio_manager_module *module) char ip_devices_string[64]; char op_devices_string[64]; - char *envp[] = { + const char *envp[] = { name_string, vid_string, pid_string, -- 2.11.0
Re: [PATCH 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time
Scratch this one... Version 2 on the way with the corresponding changes in scsi_init_request... -Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: [PATCH v2] mt7601u: check memory allocation failure
On Tue, 22 Aug 2017 00:06:17 +0200, Christophe JAILLET wrote: > Check memory allocation failure and return -ENOMEM in such a case, as > already done a few lines below. > > As 'dev->tx_q' can be NULL, we also need to check for that in > 'mt7601u_free_tx()', and return early. > > Signed-off-by: Christophe JAILLETAcked-by: Jakub Kicinski
Re: kvm splat in mmu_spte_clear_track_bits
On Mon, Aug 21, 2017 at 09:58:34PM +0200, Radim Krčmář wrote: > 2017-08-21 21:12+0200, Adam Borowski: > > On Mon, Aug 21, 2017 at 09:26:57AM +0800, Wanpeng Li wrote: > > > 2017-08-21 7:13 GMT+08:00 Adam Borowski: > > > > I'm afraid I keep getting a quite reliable, but random, splat when > > > > running > > > > KVM: > > > > > > I reported something similar before. https://lkml.org/lkml/2017/6/29/64 > > > > Your problem seems to require OOM; I don't have any memory pressure at all: > > running a single 2GB guest while there's nothing big on the host (bloatfox, > > xfce, xorg, terminals + some minor junk); 8GB + (untouched) swap. There's > > no memory pressure inside the guest either -- none was Linux (I wanted to > > test something on hurd, kfreebsd) and I doubt they even got to use all of > > their frames. > > I even tried hurd, but couldn't reproduce ... Also happens with a win10 guest, and with multiple Linuxes. > what is your qemu command > line and the output of host's `grep . /sys/module/kvm*/parameters/*`? qemu-system-x86_64 -enable-kvm -m 2048 -vga qxl -usbdevice tablet \ -net bridge -net nic \ -drive file="$DISK",cache=writeback,index=0,media=disk,discard=on qemu-system-x86_64 -enable-kvm -m 2048 -vga qxl -usbdevice tablet \ -net bridge -net nic \ -drive file="$DISK",cache=unsafe,index=0,media=disk,discard=on,if=virtio,format=raw /sys/module/kvm/parameters/halt_poll_ns:20 /sys/module/kvm/parameters/halt_poll_ns_grow:2 /sys/module/kvm/parameters/halt_poll_ns_shrink:0 /sys/module/kvm/parameters/ignore_msrs:N /sys/module/kvm/parameters/kvmclock_periodic_sync:Y /sys/module/kvm/parameters/lapic_timer_advance_ns:0 /sys/module/kvm/parameters/min_timer_period_us:500 /sys/module/kvm/parameters/tsc_tolerance_ppm:250 /sys/module/kvm/parameters/vector_hashing:Y /sys/module/kvm_amd/parameters/avic:0 /sys/module/kvm_amd/parameters/nested:1 /sys/module/kvm_amd/parameters/npt:1 /sys/module/kvm_amd/parameters/vls:0 > > Also, it doesn't reproduce for me on 4.12. > > Great info ... the most suspicious between v4.12 and v4.13-rc5 is the > series with dcdca5fed5f6 ("x86: kvm: mmu: make spte mmio mask more > explicit"), does reverting it help? > > `git revert > ce00053b1cfca312c22e2a6465451f1862561eab~1..995f00a619584e65e53eff372d9b73b121a7bad5` Alas, doesn't seem to help. I've first installed a Debian stretch guest, the host survived both the installation and subsequent fooling around. But then I started a win10 guest which splatted as soon as the initial screen. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀-- Genghis Ht'rok'din ⠈⠳⣄
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On 08/04/2017 10:11 AM, Eric Dumazet wrote: > On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote: >> On 08/03/2017 10:36 PM, Eric Dumazet wrote: >>> On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: During testing with a background iperf pushing 1Gbit/sec worth of traffic and having both ifconfig and ethtool collect statistics, we could see quite frequent deadlocks. Convert the often accessed DSA slave network devices statistics to per-cpu 64-bit statistics to remove these deadlocks and provide fast efficient statistics updates. >>> >>> This seems to be a bug fix, it would be nice to get a proper tag like : >>> >>> Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") >> >> Right, should have been added, thanks! >> >>> >>> Problem here is that if multiple cpus can call dsa_switch_rcv() at the >>> same time, then u64_stats_update_begin() contract is not respected. >> >> This is really where I struggled understanding what is wrong in the >> non-per CPU version, my understanding is that we have: >> >> - writers for xmit executes in process context >> - writers for receive executes from NAPI (from the DSA's master network >> device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa >> -> netif_receive_skb) >> >> readers should all execute in process context. The test scenario that >> led to a deadlock involved running iperf in the background, having a >> while loop with both ifconfig and ethtool reading stats, and somehow >> when iperf exited, either reader would just be locked. So I guess this >> leaves us with the two writers not being mutually excluded then, right? > > You could add a debug version of u64_stats_update_begin() > > doing > > int ret = atomic_inc((atomic_t *)syncp); > > BUG_ON(ret & 1);> > > And u64_stats_update_end() > > int ret = atomic_inc((atomic_t *)syncp); so with your revised suggested patch: static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) { #if BITS_PER_LONG==32 && defined(CONFIG_SMP) int ret = atomic_inc_return((atomic_t *)syncp); BUG_ON(ret & 1); #endif #if 0 #if BITS_PER_LONG==32 && defined(CONFIG_SMP) write_seqcount_begin(>seq); #endif #endif } static inline void u64_stats_update_end(struct u64_stats_sync *syncp) { #if BITS_PER_LONG==32 && defined(CONFIG_SMP) int ret = atomic_inc_return((atomic_t *)syncp); BUG_ON(!(ret & 1)); #endif #if 0 #if BITS_PER_LONG==32 && defined(CONFIG_SMP) write_seqcount_end(>seq); #endif #endif } and this makes us choke pretty early in IRQ accounting, did I get your suggestion right? [0.015149] [ cut here ] [0.020051] kernel BUG at ./include/linux/u64_stats_sync.h:82! [0.026221] Internal error: Oops - BUG: 0 [#1] SMP ARM [0.031661] Modules linked in: [0.034970] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc5-01297-g7d3f0cd43fee-dirty #33 [0.043990] Hardware name: Broadcom STB (Flattened Device Tree) [0.050237] task: c180a500 task.stack: c180 [0.055065] PC is at irqtime_account_delta+0xa4/0xa8 [0.060322] LR is at 0x1 [0.063057] pc : []lr : [<0001>]psr: 01d3 [0.069652] sp : c1801eec ip : ee78b458 fp : c0e5ea48 [0.075212] r10: c18b4b40 r9 : f0803000 r8 : ee00a800 [0.080781] r7 : 0001 r6 : c180a500 r5 : c180 r4 : [0.087680] r3 : r2 : ec8c r1 : ee78b3c0 r0 : ee78b440 [0.094546] Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment user [0.102314] Control: 30c5387d Table: 3000 DAC: fffd [0.108414] Process swapper/0 (pid: 0, stack limit = 0xc1800210) [0.114791] Stack: (0xc1801eec to 0xc1802000) [0.119431] 1ee0:ee78b440 c180 c180a500 0001 c02505c8 [0.128079] 1f00: 0004 ee00a800 e000 c0227890 c17e6f20 c0278910 [0.136665] 1f20: c185724c c18079a0 f080200c c1801f58 f0802000 c0201494 c0e00c18 2053 [0.145303] 1f40: c1801f8c c180 c18b4b40 c020d238 001f [0.153915] 1f60: 00040d00 efffc940 c18b4b40 c1807440 [0.162571] 1f80: c18b4b40 c0e5ea48 0004 c1801fa8 c0322fb0 c0e00c18 2053 [0.171226] 1fa0: c18b4b40 c0e006c0 [0.179890] 1fc0: c1807448 c0e5ea48 c18b4dd4 c180745c c0e5ea44 [0.188546] 1fe0: c180c0d0 7000 420f00f3 8090 [0.197165] [] (irqtime_account_delta) from [] (irqtime_account_irq+0xc0/0xc4) [0.206664] [] (irqtime_account_irq) from [] (irq_exit+0x28/0x154) [0.215012] [] (irq_exit) from [] (__handle_domain_irq+0x60/0xb4) [0.223245] [] (__handle_domain_irq) from [] (gic_handle_irq+0x48/0x8c) [0.232035] [] (gic_handle_irq) from [] (__irq_svc+0x58/0x74) [0.239941] Exception stack(0xc1801f58 to 0xc1801fa0) [
Re: [PATCH] ASoC: simple-scu-card: Parse off codec widgets
Hi Daniel > > > @@ -24,6 +24,7 @@ Optional subnode properties: > > > - simple-audio-card,convert-rate : platform specified sampling rate > > > convert > > > - simple-audio-card,convert-channels : platform specified converted > > > channel size (2 - 8 ch) > > > - simple-audio-card,prefix : see routing > > > +- simple-audio-card,widgets : Please refer to widgets.txt. > > > - simple-audio-card,routing : A list of the connections > > > between audio components. > > > Each entry is a pair of strings, the > > > first being the connection's sink, > > > the second being the connection's > > > source. Valid names for sources. > > It can be "see simple-audio-card.txt" same as other properties. > > Not a big deal though > > > > Acked-by: Kuninori Morimoto> > Thanks for having a look. I don't have a strong preference, but given that > the patch was already pushed and when you'll go to simple-audio-card.txt it > will point you to widgets.txt we can leave it like that. Thanks. No problem for me, it is not a big deal :) Best regards --- Kuninori Morimoto
Re: [PATCH 06/15] mtd: make device_type const
Le Sat, 19 Aug 2017 13:52:17 +0530, Bhumika Goyala écrit : > Make this const as it is only stored in the type field of a device > structure, which is const. > Done using Coccinelle. > Applied to l2-mtd/master. Thanks, Boris > Signed-off-by: Bhumika Goyal > --- > drivers/mtd/mtdcore.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c > index f872a99..e7ea842 100644 > --- a/drivers/mtd/mtdcore.c > +++ b/drivers/mtd/mtdcore.c > @@ -340,7 +340,7 @@ static ssize_t mtd_bbtblocks_show(struct device *dev, > }; > ATTRIBUTE_GROUPS(mtd); > > -static struct device_type mtd_devtype = { > +static const struct device_type mtd_devtype = { > .name = "mtd", > .groups = mtd_groups, > .release= mtd_release,
Re: [PATCH] sched/fair: move definitions to fix !CONFIG_SMP
On Mon, Aug 21, 2017 at 04:03:05PM -0400, jo...@toxicpanda.com wrote: > From: Josef Bacik> > The series of patches adding runnable_avg and subsequent supporting > patches broke on !CONFIG_SMP. Fix this by moving the definitions under > the appropriate checks, and moving the !CONFIG_SMP definitions higher > up. > > Signed-off-by: Josef Bacik Sorry ignore this, it's still screwed up. I'll send a new series, there's multiple broken things here. Thanks, Josef
Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
Brian, >> Thanks for the detailed analysis. This is very helpful. Have you >> considered to change the ipr driver such that it terminates REPORT >> SUPPORTED OPERATION CODES commands with the appropriate check >> condition code instead of DID_ERROR? > > Yes. That data is actually in the sense buffer, but since I'm also > setting DID_ERROR, scsi_decide_disposition isn't using it. I've got a > patch to do just as you suggest, to stop setting DID_ERROR when there > is more detailed error data available, but it will need some > additional testing before I submit, as it will impact much more than > just this case. I agree. In this case where a command is not supported, a check condition would be a better way to signal the failure to the SCSI midlayer. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 08/21/2017 08:36 PM, Edward Cree wrote: On 19/08/17 00:37, Alexei Starovoitov wrote: [...] I'm tempted to just rip out env->varlen_map_value_access and always check the whole thing, because honestly I don't know what it was meant to do originally or how it can ever do any useful pruning. While drastic, it does cause your test case to pass. Original intention from 484611357c19 ("bpf: allow access into map value arrays") was that it wouldn't potentially make pruning worse if PTR_TO_MAP_VALUE_ADJ was not used, meaning that we wouldn't need to take reg state's min_value and max_value into account for state checking; this was basically due to min_value / max_value is being adjusted/tracked on every alu/jmp ops for involved regs (e.g. adjust_reg_min_max_vals() and others that mangle them) even if we have the case that no actual dynamic map access is used throughout the program. To give an example on net tree, the bpf_lxc.o prog's section increases from 36,386 to 68,226 when env->varlen_map_value_access is always true, so it does have an effect. Did you do some checks on this on net-next?
Re: [PATCH] XEN/xen-kbdfront: Enable auto repeat for xen keyboard front driver
On Mon, Aug 21, 2017 at 12:30 PM, Boris Ostrovskywrote: > > Adding maintainer (Dmitry). I can't seem to find the original in my mailbox nor in patchwork. Can you please resend? > > > -boris > > On 08/21/2017 11:41 AM, Liang Yan wrote: > > Long pressed key could not show right in XEN vncviewer after tigervnc > > client changed the way how to send repeat keys, from "Down Up Down Up > > ..." to "Down Down Dow." By enable EV_REP bit here, XEN keyboard > > device will trigger default auto repeat process from input subsystem, > > and make auto repeat keys work correctly. > > > > Signed-off-by: Liang Yan > > --- > > drivers/input/misc/xen-kbdfront.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/input/misc/xen-kbdfront.c > > b/drivers/input/misc/xen-kbdfront.c > > index fa130e7b734c..0dce9830e2f4 100644 > > --- a/drivers/input/misc/xen-kbdfront.c > > +++ b/drivers/input/misc/xen-kbdfront.c > > @@ -248,6 +248,7 @@ static int xenkbd_probe(struct xenbus_device *dev, > > kbd->id.product = 0x; > > > > __set_bit(EV_KEY, kbd->evbit); > > +__set_bit(EV_REP, kbd->evbit); > > for (i = KEY_ESC; i < KEY_UNKNOWN; i++) > > __set_bit(i, kbd->keybit); > > for (i = KEY_OK; i < KEY_MAX; i++) > > -- > > 2.14.0 > > > Thanks. -- Dmitry
Re: [PATCH v2 1/2] Input: Add driver for Cypress Generation 5 touchscreen
On Fri, Aug 18, 2017 at 08:20:44AM +0200, Mylène Josserand wrote: > This is the basic driver for the Cypress TrueTouch Gen5 touchscreen > controllers. This driver supports only the I2C bus but it uses regmap > so SPI support could be added later. > The touchscreen can retrieve some defined zone that are handled as > buttons (according to the hardware). That is why it handles > button and multitouch events. > > Signed-off-by: Mylène JosserandReviewed-by: Maxime Ripard -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com signature.asc Description: PGP signature
[PATCH] f2fs: issue discard commands if gc_urgent is set
It's time to issue all the discard commands, if user sets the idle time. Signed-off-by: Jaegeuk Kim--- fs/f2fs/segment.c | 6 +- fs/f2fs/sysfs.c | 5 + 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 1387925a0d83..e3922f902c8c 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -21,6 +21,7 @@ #include "f2fs.h" #include "segment.h" #include "node.h" +#include "gc.h" #include "trace.h" #include @@ -1194,8 +1195,11 @@ static int issue_discard_thread(void *data) if (kthread_should_stop()) return 0; - if (dcc->discard_wake) + if (dcc->discard_wake) { dcc->discard_wake = 0; + if (sbi->gc_thread && sbi->gc_thread->gc_urgent) + mark_discard_range_all(sbi); + } sb_start_intwrite(sbi->sb); diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 4bcaa9059026..b9ad9041559f 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -178,8 +178,13 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a, if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0) f2fs_reset_iostat(sbi); if (!strcmp(a->attr.name, "gc_urgent") && t == 1 && sbi->gc_thread) { + struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; + sbi->gc_thread->gc_wake = 1; wake_up_interruptible_all(>gc_thread->gc_wait_queue_head); + + dcc->discard_wake = 1; + wake_up_interruptible_all(>discard_wait_queue); } return count; -- 2.14.0.rc1.383.gd1ce394fe2-goog
[PATCH 3/4] w1-masters: Delete an error message for a failed memory allocation in four functions
From: Markus ElfringDate: Mon, 21 Aug 2017 21:40:29 +0200 Omit an extra message for a memory allocation failure in these functions. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/w1/masters/ds2490.c| 5 ++--- drivers/w1/masters/matrox_w1.c | 7 +-- drivers/w1/masters/omap_hdq.c | 4 +--- drivers/w1/masters/w1-gpio.c | 4 +--- 4 files changed, 5 insertions(+), 15 deletions(-) diff --git a/drivers/w1/masters/ds2490.c b/drivers/w1/masters/ds2490.c index 46ccb2fc4f60..c0ee6ca9ce93 100644 --- a/drivers/w1/masters/ds2490.c +++ b/drivers/w1/masters/ds2490.c @@ -998,7 +998,6 @@ static int ds_probe(struct usb_interface *intf, - if (!dev) { - pr_info("Failed to allocate new DS9490R structure.\n"); + if (!dev) return -ENOMEM; - } + dev->udev = usb_get_dev(udev); if (!dev->udev) { err = -ENOMEM; diff --git a/drivers/w1/masters/matrox_w1.c b/drivers/w1/masters/matrox_w1.c index d83d7c99d81d..62be2f9cdb4e 100644 --- a/drivers/w1/masters/matrox_w1.c +++ b/drivers/w1/masters/matrox_w1.c @@ -140,10 +140,5 @@ static int matrox_w1_probe(struct pci_dev *pdev, const struct pci_device_id *ent - if (!dev) { - dev_err(>dev, - "%s: Failed to create new matrox_device object.\n", - __func__); + if (!dev) return -ENOMEM; - } - dev->bus_master = (struct w1_bus_master *)(dev + 1); diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c index 83fc9aab34e8..6349fcd650dc 100644 --- a/drivers/w1/masters/omap_hdq.c +++ b/drivers/w1/masters/omap_hdq.c @@ -669,7 +669,5 @@ static int omap_hdq_probe(struct platform_device *pdev) - if (!hdq_data) { - dev_dbg(>dev, "unable to allocate memory\n"); + if (!hdq_data) return -ENOMEM; - } hdq_data->dev = dev; platform_set_drvdata(pdev, hdq_data); diff --git a/drivers/w1/masters/w1-gpio.c b/drivers/w1/masters/w1-gpio.c index a90728ceec5a..6e8b18bf9fb1 100644 --- a/drivers/w1/masters/w1-gpio.c +++ b/drivers/w1/masters/w1-gpio.c @@ -133,7 +133,5 @@ static int w1_gpio_probe(struct platform_device *pdev) - if (!master) { - dev_err(>dev, "Out of memory\n"); + if (!master) return -ENOMEM; - } err = devm_gpio_request(>dev, pdata->pin, "w1"); if (err) { -- 2.14.0
Re: [PATCH v3 1/5] ACPI / blacklist: add acpi_match_platform_list()
On Mon, 2017-08-21 at 22:31 +0200, Rafael J. Wysocki wrote: > On Mon, Aug 21, 2017 at 7:36 PM, Borislav Petkov> wrote: > > On Mon, Aug 21, 2017 at 05:23:37PM +, Kani, Toshimitsu wrote: > > > > > 'data' here is private to the caller. So, I do not think we > > > > > need to define the bits. Shall I change the name to > > > > > 'driver_data' to make it more explicit? > > > > > > > > You changed it to 'data'. It was a u32-used-as-boolean > > > > is_critical_error before. > > > > > > > > So you can just as well make it into flags and people can > > > > extend those flags if needed. A flag bit should be enough in > > > > most cases anyway. If they really need driver_data, then they > > > > can add a void *member. > > > > > > Hmm.. In patch 2, intel_pstate_platform_pwr_mgmt_exists() uses > > > this field for PSS and PCC, which are enum values. I think we > > > should allow drivers to set any values here. I agree that it may > > > need to be void * if we also allow drivers to set a pointer here. > > > > Let's see what Rafael prefers. > > I would retain the is_critical_error field and use that for printing > the recoverable / non-recoverable message. This is kind of > orthogonal to whether or not any extra data is needed and that can be > an additional field. In that case unsigned long should be sufficient > to accommodate a pointer if need be. Yes, we will retain the field. The question is whether this field should be retained as a driver's private data or ACPI-managed flags. My patch implements the former, which lets the callers to define the data values. For instance, acpi_blacklisted() uses this field as is_critical_error value, and intel_pstate_platform_pwr_mgmt_exists() uses it as oem_pwr_table value. Boris suggested the latter, which lets ACPI to define the flags, which are then used by the callers. For instance, he suggested ACPI to define bit0 as is_critical_error. #define ACPI_PLAT_IS_CRITICAL_ERROR BIT(0) Thanks, -Toshi
Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group
On Mon, Aug 21, 2017 at 04:21:28PM +0100, Brendan Jackman wrote: > The current use of returning NULL from find_idlest_group is broken in > two cases: > > a1) The local group is not allowed. > >In this case, we currently do not change this_runnable_load or >this_avg_load from its initial value of 0, which means we return >NULL regardless of the load of the other, allowed groups. This >results in pointlessly continuing the find_idlest_group search >within the local group and then returning prev_cpu from >select_task_rq_fair. > b) smp_processor_id() is the "idlest" and != prev_cpu. > >find_idlest_group also returns NULL when the local group is >allowed and is the idlest. The caller then continues the >find_idlest_group search at a lower level of the current CPU's >sched_domain hierarchy. However new_cpu is not updated. This means >the search is pointless and we return prev_cpu from >select_task_rq_fair. > I think its much simpler than that.. but its late, so who knows ;-) Both cases seem predicated on the assumption that we'll return @cpu when we don't find any idler CPU. Consider, if the local group is the idlest, we should stick with @cpu and simply proceed with the child domain. The confusion, and the bugs, seem to have snuck in when we started considering @prev_cpu, whenever that was. The below is mostly code movement to put that whole while(sd) loop into its own function. The effective change is setting @new_cpu = @cpu when we start that loop: @@ -6023,6 +6023,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) struct sched_group *group; int weight; + new_cpu = cpu; + if (!(sd->flags & sd_flag)) { sd = sd->child; continue; --- kernel/sched/fair.c | 83 +++-- 1 file changed, 48 insertions(+), 35 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c77e4b1d51c0..3e77265c480a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5588,10 +5588,10 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) } /* - * find_idlest_cpu - find the idlest cpu among the cpus in group. + * find_idlest_group_cpu - find the idlest cpu among the cpus in group. */ static int -find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) +find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX; unsigned int min_exit_latency = UINT_MAX; @@ -5640,6 +5640,50 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu; } +static int +find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int sd_flag) +{ + struct sched_domain *tmp; + int new_cpu = cpu; + + while (sd) { + struct sched_group *group; + int weight; + + if (!(sd->flags & sd_flag)) { + sd = sd->child; + continue; + } + + group = find_idlest_group(sd, p, cpu, sd_flag); + if (!group) { + sd = sd->child; + continue; + } + + new_cpu = find_idlest_group_cpu(group, p, cpu); + if (new_cpu == -1 || new_cpu == cpu) { + /* Now try balancing at a lower domain level of cpu */ + sd = sd->child; + continue; + } + + /* Now try balancing at a lower domain level of new_cpu */ + cpu = new_cpu; + weight = sd->span_weight; + sd = NULL; + for_each_domain(cpu, tmp) { + if (weight <= tmp->span_weight) + break; + if (tmp->flags & sd_flag) + sd = tmp; + } + /* while loop will break here if sd == NULL */ + } + + return new_cpu; +} + /* * Implement a for_each_cpu() variant that starts the scan at a given cpu * (@start), and wraps around. @@ -6019,39 +6063,8 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */ new_cpu = select_idle_sibling(p, prev_cpu, new_cpu); - } else while (sd) { - struct sched_group *group; - int weight; - - if (!(sd->flags & sd_flag)) { - sd = sd->child; - continue; - } - - group = find_idlest_group(sd, p, cpu, sd_flag); - if (!group) { - sd = sd->child; -
Re: [PATCH V3 4/4] ARM64: dts: rockchip: rk3399 add iommu nodes
Am Montag, 24. Juli 2017, 10:32:10 CEST schrieb Simon Xue: > Add VPU/VDEC/IEP/VOPL/VOPB/ISP0/ISP1 iommu nodes > > Signed-off-by: Simon Xueapplied for 4.14 (after adapting the subject a bit, dropping the vop-mmus added via another patch) Thanks Heiko
Re: [PATCH] mt7601u: check memory allocation failure
On Mon, 21 Aug 2017 22:59:56 +0200, Christophe JAILLET wrote: > Check memory allocation failure and return -ENOMEM in such a case, as > already done a few lines below > > Signed-off-by: Christophe JAILLETAcked-by: Jakub Kicinski Thanks!
Re: [PATCH] mt7601u: check memory allocation failure
Le 21/08/2017 à 23:41, Jakub Kicinski a écrit : On Mon, 21 Aug 2017 14:34:30 -0700, Jakub Kicinski wrote: On Mon, 21 Aug 2017 22:59:56 +0200, Christophe JAILLET wrote: Check memory allocation failure and return -ENOMEM in such a case, as already done a few lines below Signed-off-by: Christophe JAILLETAcked-by: Jakub Kicinski Wait, I take that back. This code is a bit weird. We would return an error, then mt7601u_dma_init() will call mt7601u_free_tx_queue() which doesn't check for tx_q == NULL condition. Looks like mt7601u_free_tx() has to check for dev->tx_q == NULL and return early if that's the case. Or mt7601u_alloc_tx() should really clean things up on it's own on failure. Ugh. You are right. Thanks for the review. I've sent a v2 which updates 'mt7601u_free_tx()'. Doing so sounds more in line with the spirit of this code. CJ
[PATCH 2/3] signal: simplify compat_sigpending()
Remove "if it's big-endian..." ifdef in compat_sigpending(), use the endian-agnostic variant. Suggested-by: Al ViroSigned-off-by: Dmitry V. Levin --- kernel/signal.c | 4 1 file changed, 4 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index a1d0426..7d9d82b 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3292,15 +3292,11 @@ SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, set) #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE1(sigpending, compat_old_sigset_t __user *, set32) { -#ifdef __BIG_ENDIAN sigset_t set; int err = do_sigpending(, sizeof(set.sig[0])); if (!err) err = put_user(set.sig[0], set32); return err; -#else - return sys_rt_sigpending((sigset_t __user *)set32, sizeof(*set32)); -#endif } #endif -- ldv
[PATCH 1/3] signal: replace sigset_to_compat() with put_compat_sigset()
There are 4 callers of sigset_to_compat() in the entire kernel. One is in sparc compat rt_sigaction(2), the rest are in kernel/signal.c itself. All are followed by copy_to_user(), and all but the sparc one are under "if it's big-endian..." ifdefs. Let's transform sigset_to_compat() into put_compat_sigset() that also calls copy_to_user(). Suggested-by: Al ViroSigned-off-by: Dmitry V. Levin --- arch/sparc/kernel/sys_sparc32.c | 6 +++--- include/linux/compat.h | 3 ++- kernel/compat.c | 20 ++-- kernel/signal.c | 27 ++- 4 files changed, 25 insertions(+), 31 deletions(-) diff --git a/arch/sparc/kernel/sys_sparc32.c b/arch/sparc/kernel/sys_sparc32.c index bca44f3..5e2bec9 100644 --- a/arch/sparc/kernel/sys_sparc32.c +++ b/arch/sparc/kernel/sys_sparc32.c @@ -159,7 +159,6 @@ COMPAT_SYSCALL_DEFINE5(rt_sigaction, int, sig, { struct k_sigaction new_ka, old_ka; int ret; - compat_sigset_t set32; /* XXX: Don't preclude handling different sized sigset_t's. */ if (sigsetsize != sizeof(compat_sigset_t)) @@ -167,6 +166,7 @@ COMPAT_SYSCALL_DEFINE5(rt_sigaction, int, sig, if (act) { u32 u_handler, u_restorer; + compat_sigset_t set32; new_ka.ka_restorer = restorer; ret = get_user(u_handler, >sa_handler); @@ -183,9 +183,9 @@ COMPAT_SYSCALL_DEFINE5(rt_sigaction, int, sig, ret = do_sigaction(sig, act ? _ka : NULL, oact ? _ka : NULL); if (!ret && oact) { - sigset_to_compat(, _ka.sa.sa_mask); ret = put_user(ptr_to_compat(old_ka.sa.sa_handler), >sa_handler); - ret |= copy_to_user(>sa_mask, , sizeof(compat_sigset_t)); + ret |= put_compat_sigset(>sa_mask, _ka.sa.sa_mask, +sizeof(oact->sa_mask)); ret |= put_user(old_ka.sa.sa_flags, >sa_flags); ret |= put_user(ptr_to_compat(old_ka.sa.sa_restorer), >sa_restorer); if (ret) diff --git a/include/linux/compat.h b/include/linux/compat.h index 5a6a109..17017bb 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -453,7 +453,8 @@ asmlinkage long compat_sys_settimeofday(struct compat_timeval __user *tv, asmlinkage long compat_sys_adjtimex(struct compat_timex __user *utp); extern void sigset_from_compat(sigset_t *set, const compat_sigset_t *compat); -extern void sigset_to_compat(compat_sigset_t *compat, const sigset_t *set); +extern int put_compat_sigset(compat_sigset_t __user *compat, +const sigset_t *set, unsigned int size); asmlinkage long compat_sys_migrate_pages(compat_pid_t pid, compat_ulong_t maxnode, const compat_ulong_t __user *old_nodes, diff --git a/kernel/compat.c b/kernel/compat.c index 6f0a0e7..d2bd03b 100644 --- a/kernel/compat.c +++ b/kernel/compat.c @@ -520,15 +520,23 @@ sigset_from_compat(sigset_t *set, const compat_sigset_t *compat) } EXPORT_SYMBOL_GPL(sigset_from_compat); -void -sigset_to_compat(compat_sigset_t *compat, const sigset_t *set) +int +put_compat_sigset(compat_sigset_t __user *compat, const sigset_t *set, + unsigned int size) { + /* size <= sizeof(compat_sigset_t) <= sizeof(sigset_t) */ +#ifdef __BIG_ENDIAN + compat_sigset_t v; switch (_NSIG_WORDS) { - case 4: compat->sig[7] = (set->sig[3] >> 32); compat->sig[6] = set->sig[3]; - case 3: compat->sig[5] = (set->sig[2] >> 32); compat->sig[4] = set->sig[2]; - case 2: compat->sig[3] = (set->sig[1] >> 32); compat->sig[2] = set->sig[1]; - case 1: compat->sig[1] = (set->sig[0] >> 32); compat->sig[0] = set->sig[0]; + case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3]; + case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2]; + case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1]; + case 1: v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0]; } + return copy_to_user(compat, , size) ? -EFAULT : 0; +#else + return copy_to_user(compat, set, size) ? -EFAULT : 0; +#endif } #ifdef CONFIG_NUMA diff --git a/kernel/signal.c b/kernel/signal.c index ed804a4..a1d0426 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2621,13 +2621,7 @@ COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset, if (error) return error; } - if (oset) { - compat_sigset_t old32; - sigset_to_compat(, _set); - if (copy_to_user(oset, , sizeof(compat_sigset_t))) - return -EFAULT; - } - return 0; + return oset ? put_compat_sigset(oset, _set, sizeof(*oset)) : 0; #else return sys_rt_sigprocmask(how, (sigset_t __user *)nset,
[PATCH 3/3] signal: lift sigset size check out of do_sigpending()
As sigsetsize argument of do_sigpending() is not used anywhere else in that function after the check, remove this argument and move the check out of do_sigpending() into rt_sigpending() and its compat analog. Suggested-by: Al ViroSigned-off-by: Dmitry V. Levin --- kernel/signal.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 7d9d82b..894418b 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2629,11 +2629,8 @@ COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset, } #endif -static int do_sigpending(void *set, unsigned long sigsetsize) +static int do_sigpending(sigset_t *set) { - if (sigsetsize > sizeof(sigset_t)) - return -EINVAL; - spin_lock_irq(>sighand->siglock); sigorsets(set, >pending.signal, >signal->shared_pending.signal); @@ -2653,7 +2650,12 @@ static int do_sigpending(void *set, unsigned long sigsetsize) SYSCALL_DEFINE2(rt_sigpending, sigset_t __user *, uset, size_t, sigsetsize) { sigset_t set; - int err = do_sigpending(, sigsetsize); + int err; + + if (sigsetsize > sizeof(*uset)) + return -EINVAL; + + err = do_sigpending(); if (!err && copy_to_user(uset, , sigsetsize)) err = -EFAULT; return err; @@ -2664,7 +2666,12 @@ COMPAT_SYSCALL_DEFINE2(rt_sigpending, compat_sigset_t __user *, uset, compat_size_t, sigsetsize) { sigset_t set; - int err = do_sigpending(, sigsetsize); + int err; + + if (sigsetsize > sizeof(*uset)) + return -EINVAL; + + err = do_sigpending(); if (!err) err = put_compat_sigset(uset, , sigsetsize); return err; @@ -3293,7 +3300,7 @@ SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, set) COMPAT_SYSCALL_DEFINE1(sigpending, compat_old_sigset_t __user *, set32) { sigset_t set; - int err = do_sigpending(, sizeof(set.sig[0])); + int err = do_sigpending(); if (!err) err = put_user(set.sig[0], set32); return err; -- ldv
linux-next: manual merge of the btrfs-kdave tree with the btrfs tree
Hi All, (As expected) Today's linux-next merge of the btrfs-kdave tree got a conflict in: fs/btrfs/compression.h between commit: 5c1aab1dd544 ("btrfs: Add zstd support") from the btrfs tree and commit: dc2f29212a26 ("btrfs: remove unused BTRFS_COMPRESS_LAST") from the btrfs-kdave tree. I fixed it up (as suggested by Chris - see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc fs/btrfs/compression.h index 2269e00854d8,3b1b0ac15fdc.. --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@@ -99,9 -99,7 +99,8 @@@ enum btrfs_compression_type BTRFS_COMPRESS_NONE = 0, BTRFS_COMPRESS_ZLIB = 1, BTRFS_COMPRESS_LZO = 2, - BTRFS_COMPRESS_TYPES = 2, + BTRFS_COMPRESS_ZSTD = 3, + BTRFS_COMPRESS_TYPES = 3, - BTRFS_COMPRESS_LAST = 4, }; struct btrfs_compress_op { @@@ -129,6 -127,7 +128,8 @@@ extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; +extern const struct btrfs_compress_op btrfs_zstd_compress; + int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); + #endif
[PATCH] once: switch to new jump label API
From: Eric BiggersSwitch the DO_ONCE() macro from the deprecated jump label API to the new one. The new one is more readable, and for DO_ONCE() it also makes the generated code more icache-friendly: now the one-time initialization code is placed out-of-line at the jump target, rather than at the inline fallthrough case. Signed-off-by: Eric Biggers --- include/linux/once.h | 6 +++--- lib/once.c | 8 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/include/linux/once.h b/include/linux/once.h index 9c98aaa87cbc..724724918e8b 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -5,7 +5,7 @@ #include bool __do_once_start(bool *done, unsigned long *flags); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags); /* Call a function exactly once. The idea of DO_ONCE() is to perform @@ -38,8 +38,8 @@ void __do_once_done(bool *done, struct static_key *once_key, ({ \ bool ___ret = false; \ static bool ___done = false; \ - static struct static_key ___once_key = STATIC_KEY_INIT_TRUE; \ - if (static_key_true(&___once_key)) { \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ unsigned long ___flags; \ ___ret = __do_once_start(&___done, &___flags); \ if (unlikely(___ret)) { \ diff --git a/lib/once.c b/lib/once.c index 05c8604627eb..831c5a6b0bb2 100644 --- a/lib/once.c +++ b/lib/once.c @@ -5,7 +5,7 @@ struct once_work { struct work_struct work; - struct static_key *key; + struct static_key_true *key; }; static void once_deferred(struct work_struct *w) @@ -14,11 +14,11 @@ static void once_deferred(struct work_struct *w) work = container_of(w, struct once_work, work); BUG_ON(!static_key_enabled(work->key)); - static_key_slow_dec(work->key); + static_branch_disable(work->key); kfree(work); } -static void once_disable_jump(struct static_key *key) +static void once_disable_jump(struct static_key_true *key) { struct once_work *w; @@ -51,7 +51,7 @@ bool __do_once_start(bool *done, unsigned long *flags) } EXPORT_SYMBOL(__do_once_start); -void __do_once_done(bool *done, struct static_key *once_key, +void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags) __releases(once_lock) { -- 2.14.1.480.gb18f417b89-goog
Re: [PATCH v5 0/3] TPS68470 PMIC drivers
On Tue, Aug 22, 2017 at 12:58 AM, Mani, Rajmohanwrote: > Hi Andy, > >> > >> > This is the patch series for TPS68470 PMIC that works as a camera >> > >> > PMIC. >> > >> > >> > >> > The patch series provide the following 3 drivers, to help configure >> > >> > the >> voltage regulators, clocks and GPIOs provided by the TPS68470 PMIC, to be >> able to use the camera sensors connected to this PMIC. >> > >> > >> > >> > TPS68470 MFD driver: >> > >> > This is the multi function driver that initializes the TPS68470 PMIC >> > >> > and >> supports the GPIO and Op Region functions. >> > >> > >> > >> > TPS68470 GPIO driver: >> > >> > This is the PMIC GPIO driver that will be used by the OS GPIO layer, >> when the BIOS / firmware triggered GPIO access is done. >> > >> > >> > >> > TPS68470 Op Region driver: >> > >> > This is the driver that will be invoked, when the BIOS / firmware >> configures the voltage / clock for the sensors / vcm devices connected to the >> PMIC. >> > >> > >> > >> >> > >> All three patches are good to me (we did few rounds of internal >> > >> review before posting v4) >> > >> >> > >> Reviewed-by: Andy Shevchenko >> > > >> > > OK, so how should they be routed? >> > >> > Good question. I don't know how last time PMIC drivers were merged, >> > here I think is just sane to route vi MFD with immutable branch >> > created. >> >> OK >> >> I will assume that the series will go in through MFD then. >> > > Now that the MFD and GPIO patches of v6 of this series have been applied on > respective trees, can you advise the next steps for the ACPI / PMIC Opregion > driver? Well, it would have been better to route the whole series through one tree. Now it's better to wait until the two other trees get merged and then apply the opregion patch. Thanks, Rafael
Re: [PATCH v11 2/4] PCI: Factor out pci_bus_wait_crs()
On Mon, Aug 21, 2017 at 03:37:06PM -0400, Sinan Kaya wrote: > On 8/21/2017 3:18 PM, Bjorn Helgaas wrote: > ... > if (pci_bus_crs_pending(id)) > return pci_bus_wait_crs(dev->bus, dev->devfn, , 6); > > > I think that makes sense. We'd want to check for CRS SV being > > enabled, e.g., maybe read PCI_EXP_RTCTL_CRSSVE back in > > pci_enable_crs() and cache it somewhere. Maybe a crs_sv_enabled bit > > in the root port's pci_dev, and check it with something like what > > pcie_root_rcb_set() does? > > > > You can observe CRS under the following conditions > > 1. root port <-> endpoint > 2. bridge <-> endpoint > 3. root port<->bridge > > I was relying on the fact that we are reading 0x001 as an indication that > this device detected CRS. Maybe, this is too indirect. > > If we also want to capture the capability, I think the right thing is to > check the parent capability. > > bool pci_bus_crs_vis_supported(struct pci_dev *bridge) > { > if (device type(bridge) == root port) > return read(root_crs_register_reg); > > if (device type(bridge) == switch) > return read(switch_crs_register); I don't understand this part. AFAIK, CRS SV is only a feature of root ports. The capability and enable bits are in the Root Capabilities and Root Control registers. It's certainly true that a device below a switch can respond with a CRS completion, but the switch is not the requester, and my understanding is that it would not take any action on the completion other than passing it upstream.
Re: [PATCH] PCI: Fix and amend express capability sizes
Hi Alex, On 10/08/2017 18:54, Alex Williamson wrote: > PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 defines the size of the PCIe express > capability structure for v1 devices with link, but we also have a need > in the vfio code for sizing the capability for devices without link, > such as root complex endpoints. Create a separate define for this > ending the structure before the link fields. > > Additionally, this reveals that PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 is > currently incorrect, ending the capability length before the v2 link > fields. Rename this to specify an RC endpoint (no link) capability > length and move PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to include the link > fields as we have for the v1 version. > > Signed-off-by: Alex Williamson> --- > include/uapi/linux/pci_regs.h |6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index c22d3ebaca20..7439821214d1 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -513,6 +513,7 @@ > #define PCI_EXP_DEVSTA_URD 0x0008 /* Unsupported Request Detected */ > #define PCI_EXP_DEVSTA_AUXPD0x0010 /* AUX Power Detected */ > #define PCI_EXP_DEVSTA_TRPND0x0020 /* Transactions Pending */ > +#define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V112 /* v1 endpoints without > link end here */ nit: this should have been PCI_EXP_CAP_* from the very beginning but I guess you don't new defines to be named differently from other *SIZEOF*? > #define PCI_EXP_LNKCAP 12 /* Link Capabilities */ > #define PCI_EXP_LNKCAP_SLS 0x000f /* Supported Link Speeds */ > #define PCI_EXP_LNKCAP_SLS_2_5GB 0x0001 /* LNKCAP2 SLS Vector bit 0 */ > @@ -556,7 +557,7 @@ > #define PCI_EXP_LNKSTA_DLLLA0x2000 /* Data Link Layer Link Active > */ > #define PCI_EXP_LNKSTA_LBMS 0x4000 /* Link Bandwidth Management Status */ > #define PCI_EXP_LNKSTA_LABS 0x8000 /* Link Autonomous Bandwidth Status */ > -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20 /* v1 endpoints end > here */ > +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20 /* v1 endpoints with > link end here */ > #define PCI_EXP_SLTCAP 20 /* Slot Capabilities */ > #define PCI_EXP_SLTCAP_ABP 0x0001 /* Attention Button Present */ > #define PCI_EXP_SLTCAP_PCP 0x0002 /* Power Controller Present */ > @@ -639,7 +640,7 @@ > #define PCI_EXP_DEVCTL2_OBFF_MSGB_EN0x4000 /* Enable OBFF Message > type B */ > #define PCI_EXP_DEVCTL2_OBFF_WAKE_EN0x6000 /* OBFF using WAKE# > signaling */ > #define PCI_EXP_DEVSTA2 42 /* Device Status 2 */ > -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 44 /* v2 endpoints end > here */ > +#define PCI_CAP_EXP_RC ENDPOINT_SIZEOF_V244 /* v2 endpoints without > link end here */ > #define PCI_EXP_LNKCAP2 44 /* Link Capabilities 2 */ > #define PCI_EXP_LNKCAP2_SLS_2_5GB 0x0002 /* Supported Speed 2.5GT/s */ > #define PCI_EXP_LNKCAP2_SLS_5_0GB 0x0004 /* Supported Speed 5.0GT/s */ > @@ -647,6 +648,7 @@ > #define PCI_EXP_LNKCAP2_CROSSLINK 0x0100 /* Crosslink supported */ > #define PCI_EXP_LNKCTL2 48 /* Link Control 2 */ > #define PCI_EXP_LNKSTA2 50 /* Link Status 2 */ > +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with > link end here */ Looks good to me. Reviewed-by: Eric Auger Eric > #define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */ > #define PCI_EXP_SLTCTL2 56 /* Slot Control 2 */ > #define PCI_EXP_SLTSTA2 58 /* Slot Status 2 */ >
Re: [PATCH] ipr: Set no_report_opcodes for RAID arrays
Brian, > Since ipr RAID arrays do not support the MAINTENANCE_IN / > MI_REPORT_SUPPORTED_OPERATION_CODES, set no_report_opcodes to prevent > it from being sent. Applied to 4.13/scsi-fixes. Thank you! -- Martin K. Petersen Oracle Linux Engineering
RE: [Patch v2 00/19] CIFS: Implement SMBDirect
> > > Hey Long, > > > > > > What testing have you done with this on the various rdma transports? > > > Does it work over IB, RoCE, and iWARP providers? > > > > Hi Steve, > > > > Currently all the tests have been done over Infiniband. We haven't > > tested on > RoCE > > or iWARP, but planned to do it in the following weeks. > > > > Long > > Ok, good. > > Is this series available on github or somewhere so we can clone it and review > it as it is applied to the kernel src? Unfortunately they are not on github. I will look into putting them there for review. Will update soon. Thanks for helping out! > > Thanks, > > Steve.
Re: [PATCH v3 1/5] ACPI / blacklist: add acpi_match_platform_list()
On Mon, Aug 21, 2017 at 7:36 PM, Borislav Petkovwrote: > On Mon, Aug 21, 2017 at 05:23:37PM +, Kani, Toshimitsu wrote: >> > > 'data' here is private to the caller. So, I do not think we need >> > > to define the bits. Shall I change the name to 'driver_data' to >> > > make it more explicit? >> > >> > You changed it to 'data'. It was a u32-used-as-boolean >> > is_critical_error before. >> > >> > So you can just as well make it into flags and people can extend >> > those flags if needed. A flag bit should be enough in most cases >> > anyway. If they really need driver_data, then they can add a void * >> > member. >> >> Hmm.. In patch 2, intel_pstate_platform_pwr_mgmt_exists() uses this >> field for PSS and PCC, which are enum values. I think we should allow >> drivers to set any values here. I agree that it may need to be void * >> if we also allow drivers to set a pointer here. > > Let's see what Rafael prefers. I would retain the is_critical_error field and use that for printing the recoverable / non-recoverable message. This is kind of orthogonal to whether or not any extra data is needed and that can be an additional field. In that case unsigned long should be sufficient to accommodate a pointer if need be. Thanks, Rafael
Re: [PATCH v4] f2fs: introduce discard_granularity sysfs entry
On 08/18, Chao Yu wrote: > Hi Jaegeuk, > > Sorry for the delay, the modification looks good to me. ;) We must avoid waking up discard thread caused by # of pending commands which are never issued. >From a73f8807248c2f42328a2204eab16a3b8d32c83e Mon Sep 17 00:00:00 2001 From: Chao YuDate: Mon, 7 Aug 2017 23:09:56 +0800 Subject: [PATCH] f2fs: introduce discard_granularity sysfs entry Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables f2fs to issue 4K size discard in real-time discard mode. However, issuing smaller discard may cost more lifetime but releasing less free space in flash device. Since f2fs has ability of separating hot/cold data and garbage collection, we can expect that small-sized invalid region would expand soon with OPU, deletion or garbage collection on valid datas, so it's better to delay or skip issuing smaller size discards, it could help to reduce overmuch consumption of IO bandwidth and lifetime of flash storage. This patch makes f2fs selectng 64K size as its default minimal granularity, and issue discard with the size which is not smaller than minimal granularity. Also it exposes discard granularity as sysfs entry for configuration in different scenario. Jaegeuk Kim: We must issue all the accumulated discard commands when fstrim is called. So, I've added pend_list_tag[] to indicate whether we should issue the commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them. P_TRIM is set once at a time, given fstrim trigger. In addition, issue_discard_thread is calling too much due to the number of discard commands remaining in the pending list. I added a timer to control it likewise gc_thread. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim --- Documentation/ABI/testing/sysfs-fs-f2fs | 9 fs/f2fs/f2fs.h | 12 + fs/f2fs/segment.c | 91 - fs/f2fs/sysfs.c | 23 + 4 files changed, 121 insertions(+), 14 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 621da3fc56c5..11b7f4ebea7c 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -57,6 +57,15 @@ Contact: "Jaegeuk Kim" Description: Controls the issue rate of small discard commands. +What: /sys/fs/f2fs//discard_granularity +Date: July 2017 +Contact: "Chao Yu" +Description: + Controls discard granularity of inner discard thread, inner thread + will not issue discards with size that is smaller than granularity. + The unit size is one block, now only support configuring in range + of [1, 512]. + What: /sys/fs/f2fs//max_victim_search Date: January 2014 Contact: "Jaegeuk Kim" diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e252e5bf9791..4b993961d81d 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -148,6 +148,8 @@ enum { (BATCHED_TRIM_SEGMENTS(sbi) << (sbi)->log_blocks_per_seg) #define MAX_DISCARD_BLOCKS(sbi)BLKS_PER_SEC(sbi) #define DISCARD_ISSUE_RATE 8 +#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ +#define DEF_MAX_DISCARD_ISSUE_TIME 6 /* 60 s, if no candidates */ #define DEF_CP_INTERVAL60 /* 60 secs */ #define DEF_IDLE_INTERVAL 5 /* 5 secs */ @@ -196,11 +198,18 @@ struct discard_entry { unsigned char discard_map[SIT_VBLOCK_MAP_SIZE]; /* segment discard bitmap */ }; +/* default discard granularity of inner discard thread, unit: block count */ +#define DEFAULT_DISCARD_GRANULARITY16 + /* max discard pend list number */ #define MAX_PLIST_NUM 512 #define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \ (MAX_PLIST_NUM - 1) : (blk_num - 1)) +#define P_ACTIVE 0x01 +#define P_TRIM 0x02 +#define plist_issue(tag) (((tag) & P_ACTIVE) || ((tag) & P_TRIM)) + enum { D_PREP, D_SUBMIT, @@ -236,11 +245,14 @@ struct discard_cmd_control { struct task_struct *f2fs_issue_discard; /* discard thread */ struct list_head entry_list;/* 4KB discard entry list */ struct list_head pend_list[MAX_PLIST_NUM];/* store pending entries */ + unsigned char pend_list_tag[MAX_PLIST_NUM];/* tag for pending entries */ struct list_head wait_list; /* store on-flushing entries */ wait_queue_head_t discard_wait_queue; /* waiting queue for wake-up */ + unsigned int discard_wake; /* to wake up discard thread */ struct mutex cmd_lock; unsigned int nr_discards; /* # of
Re: [PATCH 0/8] constify parisc parisc_device_id
Hi Arvind, On 19.08.2017 19:42, Arvind Yadav wrote: > parisc_device_id are not supposed to change at runtime. All functions > working with parisc_device_id provided by work with > const parisc_device_id. So mark the non-const structs as const. Basically your patches are correct, but those structs aren't used after bootup any longer. So, they are much better placed in the __initconst or __initdata sections so that they get dropped before the kernel enters userspace. Changing it to __initconst includes more changes to those files than just changing one line. So, I won't apply your patches. Instead I've hacked up new versions in my tree which move those to __init* sections. Anyway, thanks for your patches! Helge > Arvind Yadav (8): > [PATCH 1/8] parisc: asp: constify parisc_device_id > [PATCH 2/8] parisc: ccio: constify parisc_device_id > [PATCH 3/8] parisc: dino: constify parisc_device_id > [PATCH 4/8] parisc: hppb: constify parisc_device_id > [PATCH 5/8] parisc: lasi: constify parisc_device_id > [PATCH 6/8] parisc: lba_pci: constify parisc_device_id > [PATCH 7/8] parisc: sba_iommu: constify parisc_device_id > [PATCH 8/8] parisc: wax: constify parisc_device_id > > drivers/parisc/asp.c | 2 +- > drivers/parisc/ccio-rm-dma.c | 2 +- > drivers/parisc/dino.c| 2 +- > drivers/parisc/hppb.c| 2 +- > drivers/parisc/lasi.c| 2 +- > drivers/parisc/lba_pci.c | 2 +- > drivers/parisc/sba_iommu.c | 2 +- > drivers/parisc/wax.c | 2 +- > 8 files changed, 8 insertions(+), 8 deletions(-) >
[PATCH V9 0/2] powerpc/dlpar: Correct display of hot-add/hot-remove CPUs and memory
On Power systems with shared configurations of CPUs and memory, there are some issues with association of additional CPUs and memory to nodes when hot-adding resources. These patches address some of those problems. powerpc/numa: Correct the currently broken capability to set the topology for shared CPUs in LPARs. At boot time for shared CPU lpars, the topology for each shared CPU is set to node zero, however, this is now updated correctly using the Virtual Processor Home Node (VPHN) capabilities information provided by the pHyp. The VPHN handling in Linux is disabled, if PRRN handling is present. powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the "rtas" device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction, nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the property is not present at boot, no operation will be performed to define or enable additional nodes. Signed-off-by: Michael BringmannMichael Bringmann (2): powerpc/numa: Update CPU topology when VPHN enabled powerpc/nodes: Ensure enough nodes avail for operations --- Changes in V9: -- Calculate number of nodes via property "ibm,max-associativity-domains"
linux-next: Signed-off-by missing for commit in the arc tree
Hi Vineet, Commit 62611ac87d44 ("ARC: [plat-eznps] handle extra aux regs #2: kernel/entry exit") is missing a Signed-off-by from its author. -- Cheers, Stephen Rothwell
[PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn
Save / restore the retry counter in scsi_cmd in scsi_init_command. This allows us to go back through scsi_init_command for retries and not forget we are doing a retry. Signed-off-by: Brian King--- Index: linux-2.6.git/drivers/scsi/scsi_lib.c === --- linux-2.6.git.orig/drivers/scsi/scsi_lib.c +++ linux-2.6.git/drivers/scsi/scsi_lib.c @@ -1155,6 +1155,7 @@ void scsi_init_command(struct scsi_devic void *prot = cmd->prot_sdb; unsigned int unchecked_isa_dma = cmd->flags & SCMD_UNCHECKED_ISA_DMA; unsigned long jiffies_at_alloc = cmd->jiffies_at_alloc; + int retries = cmd->retries; /* zero out the cmd, except for the embedded scsi_request */ memset((char *)cmd + sizeof(cmd->req), 0, @@ -1166,6 +1167,7 @@ void scsi_init_command(struct scsi_devic cmd->flags = unchecked_isa_dma; INIT_DELAYED_WORK(>abort_work, scmd_eh_abort_handler); cmd->jiffies_at_alloc = jiffies_at_alloc; + cmd->retries = retries; scsi_add_cmd_to_list(cmd); }
[PATCH 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time
Move the initialization of scsi_cmd->jiffies_at_alloc to allocation time rather than prep time. Also ensure that jiffies_at_alloc is preserved when we go through prep. This lets us send retries through prep again and not break the overall retry timer logic in scsi_softirq_done. Suggested-by: Bart Van AsscheSigned-off-by: Brian King --- Index: linux-2.6.git/drivers/scsi/scsi_lib.c === --- linux-2.6.git.orig/drivers/scsi/scsi_lib.c +++ linux-2.6.git/drivers/scsi/scsi_lib.c @@ -1154,6 +1154,7 @@ void scsi_init_command(struct scsi_devic void *buf = cmd->sense_buffer; void *prot = cmd->prot_sdb; unsigned int unchecked_isa_dma = cmd->flags & SCMD_UNCHECKED_ISA_DMA; + unsigned long jiffies_at_alloc = cmd->jiffies_at_alloc; /* zero out the cmd, except for the embedded scsi_request */ memset((char *)cmd + sizeof(cmd->req), 0, @@ -1164,7 +1165,7 @@ void scsi_init_command(struct scsi_devic cmd->prot_sdb = prot; cmd->flags = unchecked_isa_dma; INIT_DELAYED_WORK(>abort_work, scmd_eh_abort_handler); - cmd->jiffies_at_alloc = jiffies; + cmd->jiffies_at_alloc = jiffies_at_alloc; scsi_add_cmd_to_list(cmd); } @@ -2119,6 +2120,7 @@ static int scsi_init_rq(struct request_q if (!cmd->sense_buffer) goto fail; cmd->req.sense = cmd->sense_buffer; + cmd->jiffies_at_alloc = jiffies; if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) { cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp);
Re: [PATCH v2 1/4] clk: rockchip: add rv1108 ACLK_GAMC and PCLK_GMAC ID
Am Montag, 21. August 2017, 16:16:04 CEST schrieb Elaine Zhang: > This patch exports gmac aclk and pclk for dts reference. > > Signed-off-by: Elaine Zhangapplied for 4.14 Thanks Heiko
[PATCHv2 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time
This second version also sets up jiffies_at_alloc in scsi_init_request. This has been tested without the second patch in the series and I've confirmed I now see the following in the logs after booting: [ 121.718088] sd 1:2:0:0: timing out command, waited 120s [ 121.798081] sd 1:2:1:0: timing out command, waited 120s Without this patch I was never seeing these messages, indicating the retry timer code wasn't working. Also, after seeing these messages, I've confirmed there are no longer any hung tasks in the kernel with sysrq-w, while before, without this patch, I would see hung tasks for the scsi_report_opcodes calls which were getting retried forever. 8< Move the initialization of scsi_cmd->jiffies_at_alloc to allocation time rather than prep time. Also ensure that jiffies_at_alloc is preserved when we go through prep. This lets us send retries through prep again and not break the overall retry timer logic in scsi_softirq_done. Suggested-by: Bart Van AsscheSigned-off-by: Brian King --- Index: linux-2.6.git/drivers/scsi/scsi_lib.c === --- linux-2.6.git.orig/drivers/scsi/scsi_lib.c +++ linux-2.6.git/drivers/scsi/scsi_lib.c @@ -1154,6 +1154,7 @@ void scsi_init_command(struct scsi_devic void *buf = cmd->sense_buffer; void *prot = cmd->prot_sdb; unsigned int unchecked_isa_dma = cmd->flags & SCMD_UNCHECKED_ISA_DMA; + unsigned long jiffies_at_alloc = cmd->jiffies_at_alloc; /* zero out the cmd, except for the embedded scsi_request */ memset((char *)cmd + sizeof(cmd->req), 0, @@ -1164,7 +1165,7 @@ void scsi_init_command(struct scsi_devic cmd->prot_sdb = prot; cmd->flags = unchecked_isa_dma; INIT_DELAYED_WORK(>abort_work, scmd_eh_abort_handler); - cmd->jiffies_at_alloc = jiffies; + cmd->jiffies_at_alloc = jiffies_at_alloc; scsi_add_cmd_to_list(cmd); } @@ -2016,6 +2017,7 @@ static int scsi_init_request(struct blk_ if (!cmd->sense_buffer) return -ENOMEM; cmd->req.sense = cmd->sense_buffer; + cmd->jiffies_at_alloc = jiffies; if (scsi_host_get_prot(shost)) { sg = (void *)cmd + sizeof(struct scsi_cmnd) + @@ -2119,6 +2121,7 @@ static int scsi_init_rq(struct request_q if (!cmd->sense_buffer) goto fail; cmd->req.sense = cmd->sense_buffer; + cmd->jiffies_at_alloc = jiffies; if (scsi_host_get_prot(shost) >= SHOST_DIX_TYPE0_PROTECTION) { cmd->prot_sdb = kmem_cache_zalloc(scsi_sdb_cache, gfp);
Re: [PATCH v2] KVM: nVMX: Fix trying to cancel vmlauch/vmresume
2017-08-22 0:20 GMT+08:00 Radim Krčmář: > 2017-08-18 07:11-0700, Wanpeng Li: >> From: Wanpeng Li >> >> [ cut here ] >> WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 >> nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] >> CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: GW OE 4.13.0-rc4+ >> #11 >> RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] >> Call Trace: >> ? kvm_multiple_exception+0x149/0x170 [kvm] >> ? handle_emulation_failure+0x79/0x230 [kvm] >> ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] >> ? check_chain_key+0x137/0x1e0 >> ? reexecute_instruction.part.168+0x130/0x130 [kvm] >> nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] >> ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] >> vmx_queue_exception+0x197/0x300 [kvm_intel] >> kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] >> ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] >> ? preempt_count_sub+0x18/0xc0 >> ? restart_apic_timer+0x17d/0x300 [kvm] >> ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] >> ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] >> kvm_vcpu_ioctl+0x4e4/0x910 [kvm] >> ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] >> ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] >> >> The flag "nested_run_pending", which can override the decision of which >> should run >> next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not >> L1. This >> is necessary in particular when L1 did a VMLAUNCH of L2 and therefore >> expects L2 to >> be run (and perhaps be injected with an event it specified, etc.). >> Nested_run_pending >> is especially intended to avoid switching to L1 in the injection >> decision-point. >> >> I catch this in the queue exception path, this patch fixes it by requesting >> an immediate VM exit from L2 and keeping the exception for L1 pending for a >> subsequent nested VM exit. >> >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> @@ -6356,8 +6356,8 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, >> bool req_int_win) >> kvm_update_dr7(vcpu); >> } > > Hm, we shouldn't execute the code above if exception won't be injected. > >> >> - kvm_x86_ops->queue_exception(vcpu); >> - return 0; > > vmx_complete_interrupts() assumes that the exception is always injected, > so it would be dropped by kvm_clear_exception_queue(). > > I'm starting to wonder whether getting rid of nested_run_pending > wouldn't be nicer. Yeah, I rethink of your concern for nested_run_pending w/ return value is 0, actually the path in the calltrace is the else branch in nested_vmx_check_exception(), an exception will be injected to L2 by L1 if L1 owns this exception, otherwise injected by L0 directly. For the nested_run_pending w/ return value is 0 stuff, we can treat it as L0 injects the exception to L2 directly. So there is no exception is injected to wrong guest. Regards, Wanpeng Li
Re: [PATCH v2] KVM: nVMX: Fix trying to cancel vmlauch/vmresume
2017-08-22 6:55 GMT+08:00 Wanpeng Li: > 2017-08-22 0:20 GMT+08:00 Radim Krčmář : >> 2017-08-18 07:11-0700, Wanpeng Li: >>> From: Wanpeng Li >>> >>> [ cut here ] >>> WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 >>> nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] >>> CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: GW OE >>> 4.13.0-rc4+ #11 >>> RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] >>> Call Trace: >>> ? kvm_multiple_exception+0x149/0x170 [kvm] >>> ? handle_emulation_failure+0x79/0x230 [kvm] >>> ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] >>> ? check_chain_key+0x137/0x1e0 >>> ? reexecute_instruction.part.168+0x130/0x130 [kvm] >>> nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] >>> ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] >>> vmx_queue_exception+0x197/0x300 [kvm_intel] >>> kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] >>> ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] >>> ? preempt_count_sub+0x18/0xc0 >>> ? restart_apic_timer+0x17d/0x300 [kvm] >>> ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] >>> ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] >>> kvm_vcpu_ioctl+0x4e4/0x910 [kvm] >>> ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] >>> ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] >>> >>> The flag "nested_run_pending", which can override the decision of which >>> should run >>> next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not >>> L1. This >>> is necessary in particular when L1 did a VMLAUNCH of L2 and therefore >>> expects L2 to >>> be run (and perhaps be injected with an event it specified, etc.). >>> Nested_run_pending >>> is especially intended to avoid switching to L1 in the injection >>> decision-point. >>> >>> I catch this in the queue exception path, this patch fixes it by requesting >>> an immediate VM exit from L2 and keeping the exception for L1 pending for a >>> subsequent nested VM exit. >>> >>> Cc: Paolo Bonzini >>> Cc: Radim Krčmář >>> Signed-off-by: Wanpeng Li >>> --- >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> @@ -6356,8 +6356,8 @@ static int inject_pending_event(struct kvm_vcpu >>> *vcpu, bool req_int_win) >>> kvm_update_dr7(vcpu); >>> } >> >> Hm, we shouldn't execute the code above if exception won't be injected. >> >>> >>> - kvm_x86_ops->queue_exception(vcpu); >>> - return 0; >> >> vmx_complete_interrupts() assumes that the exception is always injected, >> so it would be dropped by kvm_clear_exception_queue(). >> >> I'm starting to wonder whether getting rid of nested_run_pending >> wouldn't be nicer. > > Yeah, I rethink of your concern for nested_run_pending w/ return value > is 0, actually the path in the calltrace is the else branch in > nested_vmx_check_exception(), an exception will be injected to L2 by > L1 if L1 owns this exception, otherwise injected by L0 directly. For > the nested_run_pending w/ return value is 0 stuff, we can treat it as > L0 injects the exception to L2 directly. So there is no exception is > injected to wrong guest. I just sent out v3 to move the nested_run_pending stuff to the else branch. Regards, Wanpeng Li
[PATCH v3] KVM: nVMX: Fix trying to cancel vmlauch/vmresume
From: Wanpeng Li[ cut here ] WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: GW OE 4.13.0-rc4+ #11 RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] Call Trace: ? kvm_multiple_exception+0x149/0x170 [kvm] ? handle_emulation_failure+0x79/0x230 [kvm] ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] ? check_chain_key+0x137/0x1e0 ? reexecute_instruction.part.168+0x130/0x130 [kvm] nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] vmx_queue_exception+0x197/0x300 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] ? preempt_count_sub+0x18/0xc0 ? restart_apic_timer+0x17d/0x300 [kvm] ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] The flag "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point. I catch this in the queue exception path, this patch fixes it by running L2 next instead of L1 in the queue exception path and injecting the pending exception to L2 directly. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- v2 -> v3: * move the nested_run_pending to the else branch v1 -> v2: * request an immediate VM exit from L2 and keep the exception for L1 pending for a subsequent nested VM exit arch/x86/kvm/vmx.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e398946..685f51e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2488,6 +2488,10 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu) } } else { unsigned long exit_qual = 0; + + if (to_vmx(vcpu)->nested.nested_run_pending) + return 0; + if (nr == DB_VECTOR) exit_qual = vcpu->arch.dr6; -- 2.7.4
Re: [PATCH] Fix compat_sys_sigpending breakage introduced by v4.13-rc1~6^2~12
On Sun, Aug 06, 2017 at 07:22:03PM +0100, Al Viro wrote: > On Sat, Aug 05, 2017 at 11:00:50PM +0300, Dmitry V. Levin wrote: > > The latest change of compat_sys_sigpending has broken it in two ways. > > > > First, it tries to write 4 bytes more than userspace expects: > > sizeof(old_sigset_t) == sizeof(long) == 8 instead of > > sizeof(compat_old_sigset_t) == sizeof(u32) == 4. > > > > Second, on big endian architectures these bytes are being written > > in the wrong order. > > > @@ -3303,12 +3303,15 @@ SYSCALL_DEFINE1(sigpending, old_sigset_t __user *, > > set) > > #ifdef CONFIG_COMPAT > > COMPAT_SYSCALL_DEFINE1(sigpending, compat_old_sigset_t __user *, set32) > > { > > +#ifdef __BIG_ENDIAN > > sigset_t set; > > - int err = do_sigpending(, sizeof(old_sigset_t)); > > - if (err == 0) > > - if (copy_to_user(set32, , sizeof(old_sigset_t))) > > - err = -EFAULT; > > + int err = do_sigpending(, sizeof(set.sig[0])); > > + if (!err) > > + err = put_user(set.sig[0], set32); > > return err; > > +#else > > + return sys_rt_sigpending((sigset_t __user *)set32, sizeof(*set32)); > > +#endif > > Interesting... Basically, your fix makes it parallel to compat > rt_sigpending(2); > I agree that the bug is real and gets fixed by that, but... rt_sigpending() > itself looks a bit fishy. There we have > compat_sigset_t set32; > sigset_to_compat(, ); > /* we can get here only if sigsetsize <= sizeof(set) */ > if (copy_to_user(uset, , sigsetsize)) > err = -EFAULT; > in big-endian case; now, there are 4 callers of sigset_to_compat() in the > entire kernel. One in sparc compat rt_sigaction(2), the rest in > kernel/signal.c > itself. All are followed by copy_to_user(), and all but the sparc one are > under that kind of "if it's big-endian..." ifdefs. > > Looks like it might make sense to do this: > put_compat_sigset(compat_sigset_t __user *compat, const sigset_t *set, int > size) > { > #ifdef > compat_sigset_t v; > switch (_NSIG_WORDS) { > case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3]; > case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2]; > case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1]; > case 1: v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0]; > } > return copy_to_user(compat, , size) ? -EFAULT : 0; > #else > return copy_to_user(compat, set, size) ? -EFAULT : 0; > #endif > } > > int put_compat_old_sigset(compat_old_sigset_t __user *compat, const sigset_t > *set) > { > /* we want bits 0--31 of the bitmap */ > return put_user(compat, set->sig[0]); > } [...] > COMPAT_SYSCALL_DEFINE2(sigpending, compat_old_sigset_t __user *, uset) > { > sigset_t set; > int err = do_sigpending(, sizeof(set)); > if (!err) > err = put_compat_old_sigset(uset, ); > return err; > } I don't think a separate function for put_user(compat, set->sig[0]) is needed given that its only user is going to be compat_sigpending(). Introducing put_compat_sigset() and moving sigset size check out of do_sigpending() definitely makes sense, patches will follow shortly. -- ldv
Re: [PATCH v2 1/8] dt-bindings: mediatek: Add binding for mt2712 IOMMU and SMI
On Mon, Aug 21, 2017 at 07:00:14PM +0800, Yong Wu wrote: > This patch adds decriptions for mt2712 IOMMU and SMI. > > In order to balance the bandwidth, mt2712 has two M4Us, two > smi-commons, 10 smi-larbs. and mt2712 is also MTK IOMMU gen2 which > uses ARM Short-Descriptor translation table format. > > The mt2712 M4U-SMI HW diagram is as below: > > EMI > | > > | | > M4U0 M4U1 > | | > smi-common0smi-common1 > | | > - > | | | | | | || | | > | | | | | | || | | > larb0 larb1 larb2 larb3 larb6larb4larb5larb7 larb8 larb9 > disp0 vdec cam venc jpg mdp1/disp1 mdp2/disp2 mdp3 vdo/nr tvd > > All the connections are HW fixed, SW can NOT adjust it. > > Signed-off-by: Yong Wu> --- > Hi Rob, > Comparing with the v1, I add larb8 and larb9 in this version. > So I don't add your ACK here. Thanks for the explanation. That's minor enough you could have kept it. Acked-by: Rob Herring > --- > .../devicetree/bindings/iommu/mediatek,iommu.txt | 6 +- > .../memory-controllers/mediatek,smi-common.txt | 6 +- > .../memory-controllers/mediatek,smi-larb.txt | 5 +- > include/dt-bindings/memory/mt2712-larb-port.h | 102 > + > 4 files changed, 113 insertions(+), 6 deletions(-) > create mode 100644 include/dt-bindings/memory/mt2712-larb-port.h
Re: [PATCH net-next] net: dsa: User per-cpu 64-bit statistics
On 08/21/2017 04:23 PM, Florian Fainelli wrote: > On 08/04/2017 10:11 AM, Eric Dumazet wrote: >> On Fri, 2017-08-04 at 08:51 -0700, Florian Fainelli wrote: >>> On 08/03/2017 10:36 PM, Eric Dumazet wrote: On Thu, 2017-08-03 at 21:33 -0700, Florian Fainelli wrote: > During testing with a background iperf pushing 1Gbit/sec worth of > traffic and having both ifconfig and ethtool collect statistics, we > could see quite frequent deadlocks. Convert the often accessed DSA slave > network devices statistics to per-cpu 64-bit statistics to remove these > deadlocks and provide fast efficient statistics updates. > This seems to be a bug fix, it would be nice to get a proper tag like : Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") >>> >>> Right, should have been added, thanks! >>> Problem here is that if multiple cpus can call dsa_switch_rcv() at the same time, then u64_stats_update_begin() contract is not respected. >>> >>> This is really where I struggled understanding what is wrong in the >>> non-per CPU version, my understanding is that we have: >>> >>> - writers for xmit executes in process context >>> - writers for receive executes from NAPI (from the DSA's master network >>> device through it's own NAPI doing netif_receive_skb -> netdev_uses_dsa >>> -> netif_receive_skb) >>> >>> readers should all execute in process context. The test scenario that >>> led to a deadlock involved running iperf in the background, having a >>> while loop with both ifconfig and ethtool reading stats, and somehow >>> when iperf exited, either reader would just be locked. So I guess this >>> leaves us with the two writers not being mutually excluded then, right? >> >> You could add a debug version of u64_stats_update_begin() >> >> doing >> >> int ret = atomic_inc((atomic_t *)syncp); >> >> BUG_ON(ret & 1);> >> >> And u64_stats_update_end() >> >> int ret = atomic_inc((atomic_t *)syncp); > > so with your revised suggested patch: > > static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) > { > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > int ret = atomic_inc_return((atomic_t *)syncp); > BUG_ON(ret & 1); > #endif > #if 0 > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > write_seqcount_begin(>seq); > #endif > #endif > } > > static inline void u64_stats_update_end(struct u64_stats_sync *syncp) > { > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > int ret = atomic_inc_return((atomic_t *)syncp); > BUG_ON(!(ret & 1)); > #endif > #if 0 > #if BITS_PER_LONG==32 && defined(CONFIG_SMP) > write_seqcount_end(>seq); > #endif > #endif > } > > and this makes us choke pretty early in IRQ accounting, did I get your > suggestion right? Well if we return 1 from atomic_inc_return() and the previous value was zero, of course we are going to be bugging here. The idea behind the patch I suppose is to make sure that we always get an odd number upon u64_stats_update_begin()/entry, and an even number upon u64_stats_update_end()/exit, right? > > [0.015149] [ cut here ] > [0.020051] kernel BUG at ./include/linux/u64_stats_sync.h:82! > [0.026221] Internal error: Oops - BUG: 0 [#1] SMP ARM > [0.031661] Modules linked in: > [0.034970] CPU: 0 PID: 0 Comm: swapper/0 Not tainted > 4.13.0-rc5-01297-g7d3f0cd43fee-dirty #33 > [0.043990] Hardware name: Broadcom STB (Flattened Device Tree) > [0.050237] task: c180a500 task.stack: c180 > [0.055065] PC is at irqtime_account_delta+0xa4/0xa8 > [0.060322] LR is at 0x1 > [0.063057] pc : []lr : [<0001>]psr: 01d3 > [0.069652] sp : c1801eec ip : ee78b458 fp : c0e5ea48 > [0.075212] r10: c18b4b40 r9 : f0803000 r8 : ee00a800 > [0.080781] r7 : 0001 r6 : c180a500 r5 : c180 r4 : > [0.087680] r3 : r2 : ec8c r1 : ee78b3c0 r0 : ee78b440 > [0.094546] Flags: nzcv IRQs off FIQs off Mode SVC_32 ISA ARM > Segment user > [0.102314] Control: 30c5387d Table: 3000 DAC: fffd > [0.108414] Process swapper/0 (pid: 0, stack limit = 0xc1800210) > [0.114791] Stack: (0xc1801eec to 0xc1802000) > [0.119431] 1ee0:ee78b440 c180 > c180a500 0001 c02505c8 > [0.128079] 1f00: 0004 ee00a800 e000 > c0227890 c17e6f20 c0278910 > [0.136665] 1f20: c185724c c18079a0 f080200c c1801f58 f0802000 > c0201494 c0e00c18 2053 > [0.145303] 1f40: c1801f8c c180 c18b4b40 > c020d238 001f > [0.153915] 1f60: 00040d00 efffc940 c18b4b40 > c1807440 > [0.162571] 1f80: c18b4b40 c0e5ea48 0004 c1801fa8 c0322fb0 > c0e00c18 2053 > [0.171226] 1fa0: c18b4b40 > c0e006c0 > [0.179890] 1fc0: c1807448 c0e5ea48 > c18b4dd4
Re: [PATCH v5 0/9] mtd: sharpslpart partition parser
Le Mon, 14 Aug 2017 22:48:31 +0200, Andrea Adamia écrit : > This patchset introduces a simple partition parser for the Sharp SL > Series PXA handhelds. More details in the commit text. > > I have set in cc the ARM PXA maintainers because this is the MTD part of > a planned wider patchset cleaning the Zaurus board files. The MFD maintainers > are also in cc (tmio.h change). > > Changelog: > v1 firt version, initial import of 2.4 sources > v2 refactor applying many suggested fixes > v3 put the partition parser types in the platform data > v4 refactor after ML review > v5 fix commit messages and subject texts, remove global, fixes after v4 review > > GPL sources: http://support.ezaurus.com/developer/source/source_dl.asp > > Andrea Adami (9): > mtd: sharpslpart: Add sharpslpart partition parser > mtd: nand: sharpsl: Add partition parsers platform data > mfd: tmio: Add partition parsers platform data > mtd: nand: sharpsl: Register partitions using the parsers > mtd: nand: tmio: Register partitions using the parsers Applied patches 2, to 5 to nand/next. Thanks, Boris > ARM: pxa/corgi: Remove hardcoded partitioning, use sharpslpart parser > ARM: pxa/tosa: Remove hardcoded partitioning, use sharpslpart parser > ARM: pxa/spitz: Remove hardcoded partitioning, use sharpslpart parser > ARM: pxa/poodle: Remove hardcoded partitioning, use sharpslpart parser > > arch/arm/mach-pxa/corgi.c | 31 +--- > arch/arm/mach-pxa/poodle.c| 28 +-- > arch/arm/mach-pxa/spitz.c | 34 +--- > arch/arm/mach-pxa/tosa.c | 28 +-- > drivers/mtd/nand/sharpsl.c| 2 +- > drivers/mtd/nand/tmio_nand.c | 4 +- > drivers/mtd/parsers/Kconfig | 8 + > drivers/mtd/parsers/Makefile | 1 + > drivers/mtd/parsers/sharpslpart.c | 376 > ++ > include/linux/mfd/tmio.h | 1 + > include/linux/mtd/sharpsl.h | 1 + > 11 files changed, 424 insertions(+), 90 deletions(-) > create mode 100644 drivers/mtd/parsers/sharpslpart.c >
[PATCH 0/4] w1: Adjustments for some function implementations
From: Markus ElfringDate: Mon, 21 Aug 2017 22:04:56 +0200 A few update suggestions were taken into account from static source code analysis. Markus Elfring (4): Delete an error message for a failed memory allocation in two functions Improve a size determination in two functions masters: Delete an error message for a failed memory allocation in four functions masters: Improve a size determination in four functions drivers/w1/masters/ds2482.c| 3 ++- drivers/w1/masters/ds2490.c| 7 +++ drivers/w1/masters/matrox_w1.c | 7 +-- drivers/w1/masters/mxc_w1.c| 3 +-- drivers/w1/masters/omap_hdq.c | 4 +--- drivers/w1/masters/w1-gpio.c | 7 ++- drivers/w1/slaves/w1_ds28e04.c | 2 +- drivers/w1/w1.c| 9 ++--- drivers/w1/w1_int.c| 6 +- 9 files changed, 14 insertions(+), 34 deletions(-) -- 2.14.0
[PATCH 2/4] w1: Improve a size determination in two functions
From: Markus ElfringDate: Mon, 21 Aug 2017 21:17:01 +0200 Replace the specification of data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/w1/slaves/w1_ds28e04.c | 2 +- drivers/w1/w1.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/w1/slaves/w1_ds28e04.c b/drivers/w1/slaves/w1_ds28e04.c index ec234b846eb3..794db5e8f46f 100644 --- a/drivers/w1/slaves/w1_ds28e04.c +++ b/drivers/w1/slaves/w1_ds28e04.c @@ -397,5 +397,5 @@ static int w1_f1C_add_slave(struct w1_slave *sl) struct w1_f1C_data *data = NULL; if (w1_enable_crccheck) { - data = kzalloc(sizeof(struct w1_f1C_data), GFP_KERNEL); + data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index f26c1ea280dd..9f71dc7aca3a 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -711,5 +711,5 @@ int w1_attach_slave_device(struct w1_master *dev, struct w1_reg_num *rn) int err; struct w1_netlink_msg msg; - sl = kzalloc(sizeof(struct w1_slave), GFP_KERNEL); + sl = kzalloc(sizeof(*sl), GFP_KERNEL); if (!sl) -- 2.14.0
[PATCH 1/4] w1: Delete an error message for a failed memory allocation in two functions
From: Markus ElfringDate: Mon, 21 Aug 2017 21:05:42 +0200 Omit an extra message for a memory allocation failure in these functions. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/w1/w1.c | 7 +-- drivers/w1/w1_int.c | 6 +- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index 74471e7aa5cc..f26c1ea280dd 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -715,10 +715,5 @@ int w1_attach_slave_device(struct w1_master *dev, struct w1_reg_num *rn) - if (!sl) { - dev_err(>dev, -"%s: failed to allocate new slave device.\n", -__func__); + if (!sl) return -ENOMEM; - } - sl->owner = THIS_MODULE; sl->master = dev; diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c index 1c776178f598..9e37463960ed 100644 --- a/drivers/w1/w1_int.c +++ b/drivers/w1/w1_int.c @@ -44,9 +44,5 @@ static struct w1_master *w1_alloc_dev(u32 id, int slave_count, int slave_ttl, - if (!dev) { - pr_err("Failed to allocate %zd bytes for new w1 device.\n", - sizeof(struct w1_master)); + if (!dev) return NULL; - } - dev->bus_master = (struct w1_bus_master *)(dev + 1); -- 2.14.0
[patch] fs, proc: unconditional cond_resched when reading smaps
If there are large numbers of hugepages to iterate while reading /proc/pid/smaps, the page walk never does cond_resched(). On archs without split pmd locks, there can be significant and observable contention on mm->page_table_lock which cause lengthy delays without rescheduling. Always reschedule in smaps_pte_range() if necessary since the pagewalk iteration can be expensive. Signed-off-by: David Rientjes--- fs/proc/task_mmu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -599,11 +599,11 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, if (ptl) { smaps_pmd_entry(pmd, addr, walk); spin_unlock(ptl); - return 0; + goto out; } if (pmd_trans_unstable(pmd)) - return 0; + goto out; /* * The mmap_sem held all the way back in m_start() is what * keeps khugepaged out of here and from collapsing things @@ -613,6 +613,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, for (; addr != end; pte++, addr += PAGE_SIZE) smaps_pte_entry(pte, addr, walk); pte_unmap_unlock(pte - 1, ptl); +out: cond_resched(); return 0; }
Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 8/21/17 1:24 PM, Edward Cree wrote: On 18/08/17 15:16, Edward Cree wrote: On 18/08/17 04:21, Alexei Starovoitov wrote: It seems you're trying to sort-of do per-fake-basic block liveness analysis, but our state_list_marks are not correct if we go with canonical basic block definition, since we mark the jump insn and not insn after the branch and not every basic block boundary is properly detected. I think the reason this works is that jump insns can't do writes. [snip] the sl->state will never have any write marks and it'll all just work. But I should really test that! I tested this, and found that, no, sl->state can have write marks, and the algorithm will get the wrong answer in that case. So I've got a patch to make the first iteration ignore write marks, as part of a series which I will post shortly. When I do so, please re-do your tests with adding state_list_marks in strange and exciting places; it should work wherever you put them. Like you say, it "magically doesn't depend on proper basic block boundaries", and that's because really pruning is just a kind of checkpointing that just happens to be most effective when done just after a jump (pop_stack). Can I have a SOB for your "grr" test program, so I can include it in the series? yes. of course. just give the test some reasonable name :)
Re: [PATCH v3] livepatch: add (un)patch callbacks
On Fri, Aug 18, 2017 at 03:58:16PM +0200, Petr Mladek wrote: > On Wed 2017-08-16 15:17:04, Joe Lawrence wrote: > > Provide livepatch modules a klp_object (un)patching notification > > mechanism. Pre and post-(un)patch callbacks allow livepatch modules to > > setup or synchronize changes that would be difficult to support in only > > patched-or-unpatched code contexts. > > > > diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h > > index 194991ef9347..500dc9b2b361 100644 > > --- a/include/linux/livepatch.h > > +++ b/include/linux/livepatch.h > > @@ -138,6 +154,71 @@ struct klp_patch { > > func->old_name || func->new_func || func->old_sympos; \ > > func++) > > > > +/** > > + * klp_is_object_loaded() - is klp_object currently loaded? > > + * @obj: klp_object pointer > > + * > > + * Return: true if klp_object is loaded (always true for vmlinux) > > + */ > > +static inline bool klp_is_object_loaded(struct klp_object *obj) > > +{ > > + return !obj->name || obj->mod; > > +} > > + > > +/** > > + * klp_pre_patch_callback - execute before klp_object is patched > > + * @obj: invoke callback for this klp_object > > + * > > + * Return: status from callback > > + * > > + * Callers should ensure obj->patched is *not* set. > > + */ > > +static inline int klp_pre_patch_callback(struct klp_object *obj) > > +{ > > + if (obj->callbacks.pre_patch) > > + return (*obj->callbacks.pre_patch)(obj); > > + return 0; > > +} > > + > > +/** > > + * klp_post_patch_callback() - execute after klp_object is patched > > + * @obj: invoke callback for this klp_object > > + * > > + * Callers should ensure obj->patched is set. > > + */ > > +static inline void klp_post_patch_callback(struct klp_object *obj) > > +{ > > + if (obj->callbacks.post_patch) > > + (*obj->callbacks.post_patch)(obj); > > +} > > + > > +/** > > + * klp_pre_unpatch_callback() - execute before klp_object is unpatched > > + * and is active across all tasks > > + * @obj: invoke callback for this klp_object > > + * > > + * Callers should ensure obj->patched is set. > > + */ > > +static inline void klp_pre_unpatch_callback(struct klp_object *obj) > > +{ > > + if (obj->callbacks.pre_unpatch) > > + (*obj->callbacks.pre_unpatch)(obj); > > +} > > + > > +/** > > + * klp_post_unpatch_callback() - execute after klp_object is unpatched, > > + * all code has been restored and no tasks > > + * are running patched code > > + * @obj: invoke callback for this klp_object > > + * > > + * Callers should ensure obj->patched is *not* set. > > + */ > > +static inline void klp_post_unpatch_callback(struct klp_object *obj) > > +{ > > + if (obj->callbacks.post_unpatch) > > + (*obj->callbacks.post_unpatch)(obj); > > +} > > I guess that we do not want to make these function usable > outside livepatch code. Thefore these inliners should go > to kernel/livepatch/core.h or so. Okay, I can stash them away in an internal header file like core.h. > > + > > int klp_register_patch(struct klp_patch *); > > int klp_unregister_patch(struct klp_patch *); > > int klp_enable_patch(struct klp_patch *); > > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c > > index b9628e43c78f..ddb23e18a357 100644 > > --- a/kernel/livepatch/core.c > > +++ b/kernel/livepatch/core.c > > @@ -878,6 +890,8 @@ int klp_module_coming(struct module *mod) > > goto err; > > } > > > > + klp_post_patch_callback(obj); > > This should be called only if (patch != klp_transition_patch). > Otherwise, it would be called too early. Can you elaborate a bit on this scenario? When would the transition patch (as I understand it, a livepatch not quite fully (un)patched) hit the module coming/going notifier? Is it possible to load or unload a module like this? I'd like to add this scenario to my test script if possible. > > + > > break; > > } > > } > > @@ -929,7 +943,10 @@ void klp_module_going(struct module *mod) > > if (patch->enabled || patch == klp_transition_patch) { > > pr_notice("reverting patch '%s' on unloading > > module '%s'\n", > > patch->mod->name, obj->mod->name); > > + > > + klp_pre_unpatch_callback(obj); > > Also the pre_unpatch() callback should be called only > if (patch != klp_transition_patch). Otherwise, it should have > already been called. It is not the current case but see below. Ditto. > > klp_unpatch_object(obj); > > + klp_post_unpatch_callback(obj); > > } > > > > klp_free_object_loaded(obj); > > diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c > > index 52c4e907c14b..0eed0df6e6d9 100644 > > ---
Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group
On Mon, Aug 21, 2017 at 11:14:00PM +0200, Peter Zijlstra wrote: > +static int > +find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int > sd_flag) > +{ > + struct sched_domain *tmp; > + int new_cpu = cpu; > + > + while (sd) { > + struct sched_group *group; > + int weight; > + > + if (!(sd->flags & sd_flag)) { > + sd = sd->child; > + continue; > + } > + > + group = find_idlest_group(sd, p, cpu, sd_flag); > + if (!group) { > + sd = sd->child; > + continue; > + } > + > + new_cpu = find_idlest_group_cpu(group, p, cpu); > + if (new_cpu == -1 || new_cpu == cpu) { > + /* Now try balancing at a lower domain level of cpu */ > + sd = sd->child; > + continue; > + } > + > + /* Now try balancing at a lower domain level of new_cpu */ > + cpu = new_cpu; > + weight = sd->span_weight; > + sd = NULL; > + for_each_domain(cpu, tmp) { > + if (weight <= tmp->span_weight) > + break; > + if (tmp->flags & sd_flag) > + sd = tmp; > + } This find-the-sd-for-another-cpu thing is horrific. And it has always bugged me that the whole thing is O(n^2) to find a CPU. I understand why it has this form, but scanning each CPU more than once is just offensive. > + /* while loop will break here if sd == NULL */ > + } > + > + return new_cpu; > +}
[PATCH V9 1/2] powerpc/numa: Update CPU topology when VPHN enabled
powerpc/numa: Correct the currently broken capability to set the topology for shared CPUs in LPARs. At boot time for shared CPU lpars, the topology for each shared CPU is set to node zero, however, this is now updated correctly using the Virtual Processor Home Node (VPHN) capabilities information provided by the pHyp. Also, update initialization checks for device-tree attributes to independently recognize PRRN or VPHN usage. Signed-off-by: Michael Bringmann--- arch/powerpc/include/asm/topology.h | 14 ++ arch/powerpc/mm/numa.c | 64 +++--- arch/powerpc/platforms/pseries/dlpar.c |2 + arch/powerpc/platforms/pseries/hotplug-cpu.c |2 + 4 files changed, 75 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index dc4e159..85d6428 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -98,6 +98,20 @@ static inline int prrn_is_enabled(void) } #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */ +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_NEED_MULTIPLE_NODES) +#if defined(CONFIG_PPC_SPLPAR) +extern int timed_topology_update(int nsecs); +#else +#definetimed_topology_update(nsecs)0 +#endif /* CONFIG_PPC_SPLPAR */ +#endif /* CONFIG_HOTPLUG_CPU || CONFIG_NEED_MULTIPLE_NODES */ + +#if defined(CONFIG_PPC_SPLPAR) +extern void shared_topology_update(void); +#else +#defineshared_topology_update()0 +#endif /* CONFIG_PPC_SPLPAR */ + #include #ifdef CONFIG_SMP diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b95c584..3fd4536 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -906,7 +907,7 @@ void __init initmem_init(void) /* * Reduce the possible NUMA nodes to the online NUMA nodes, -* since we do not support node hotplug. This ensures that we +* since we do not support node hotplug. This ensures that we * lower the maximum NUMA node ID to what is actually present. */ nodes_and(node_possible_map, node_possible_map, node_online_map); @@ -1148,11 +1149,32 @@ struct topology_update_data { int new_nid; }; +#defineTOPOLOGY_DEF_TIMER_SECS 60 + static u8 vphn_cpu_change_counts[NR_CPUS][MAX_DISTANCE_REF_POINTS]; static cpumask_t cpu_associativity_changes_mask; static int vphn_enabled; static int prrn_enabled; static void reset_topology_timer(void); +static int topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS; +static int topology_inited; +static int topology_update_needed; + +/* + * Change polling interval for associativity changes. + */ +int timed_topology_update(int nsecs) +{ + if (nsecs > 0) + topology_timer_secs = nsecs; + else + topology_timer_secs = TOPOLOGY_DEF_TIMER_SECS; + + if (vphn_enabled) + reset_topology_timer(); + + return 0; +} /* * Store the current values of the associativity change counters in the @@ -1246,6 +1268,12 @@ static long vphn_get_associativity(unsigned long cpu, "hcall_vphn() experienced a hardware fault " "preventing VPHN. Disabling polling...\n"); stop_topology_update(); + break; + case H_SUCCESS: + printk(KERN_INFO + "VPHN hcall succeeded. Reset polling...\n"); + timed_topology_update(0); + break; } return rc; @@ -1323,8 +1351,11 @@ int numa_update_cpu_topology(bool cpus_locked) struct device *dev; int weight, new_nid, i = 0; - if (!prrn_enabled && !vphn_enabled) + if (!prrn_enabled && !vphn_enabled) { + if (!topology_inited) + topology_update_needed = 1; return 0; + } weight = cpumask_weight(_associativity_changes_mask); if (!weight) @@ -1363,6 +1394,8 @@ int numa_update_cpu_topology(bool cpus_locked) cpumask_andnot(_associativity_changes_mask, _associativity_changes_mask, cpu_sibling_mask(cpu)); + pr_info("Assoc chg gives same node %d for cpu%d\n", + new_nid, cpu); cpu = cpu_last_thread_sibling(cpu); continue; } @@ -1379,6 +1412,9 @@ int numa_update_cpu_topology(bool cpus_locked) cpu = cpu_last_thread_sibling(cpu); } + if (i) + updates[i-1].next = NULL; + pr_debug("Topology update for the following CPUs:\n"); if (cpumask_weight(_cpus)) { for (ud = [0]; ud; ud = ud->next) { @@ -1433,6 +1469,7 @@ int
[PATCH V9 2/2] powerpc/nodes: Ensure enough nodes avail for operations
To: linuxppc-...@lists.ozlabs.org From: Michael BringmannTo: linux-kernel@vger.kernel.org Cc: Michael Ellerman Cc: Michael Bringmann Cc: John Allen Cc: Nathan Fontenot Subject: [PATCH V9 2/2] powerpc/nodes: Ensure enough nodes avail for operations powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the "rtas" device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction, nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the property is not present at boot, no operation will be performed to define or enable additional nodes. Signed-off-by: Michael Bringmann --- arch/powerpc/mm/numa.c | 44 1 file changed, 44 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 3fd4536..3ae6510 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -893,6 +893,48 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = spanned_pages; } +static void __init node_associativity_setup(void) +{ + struct device_node *rtas; + printk(KERN_INFO "%s:%d\n", __FUNCTION__, __LINE__); + + rtas = of_find_node_by_path("/rtas"); + if (rtas) { + const __be32 *prop; + u32 len, entries, levelval, i; + printk(KERN_INFO "%s:%d\n", __FUNCTION__, __LINE__); + + prop = of_get_property(rtas, "ibm,max-associativity-domains", ); + if (!prop || len < sizeof(unsigned int)) { + printk(KERN_INFO "%s:%d\n", __FUNCTION__, __LINE__); + goto endit; + } + + entries = of_read_number(prop++, 1); + + if (len < (entries * sizeof(unsigned int))) { + printk(KERN_INFO "%s:%d\n", __FUNCTION__, __LINE__); + goto endit; + } + + for (i = 0; i < entries; i++) + levelval = of_read_number(prop++, 1); + + printk(KERN_INFO "Numa nodes avail: %d (%d) \n", (int) levelval, (int) entries); + + for (i = 0; i < levelval; i++) { + if (!node_possible(i)) { + setup_node_data(i, 0, 0); + node_set(i, node_possible_map); + } + } + } + +endit: + if (rtas) + of_node_put(rtas); +} + void __init initmem_init(void) { int nid, cpu; @@ -912,6 +954,8 @@ void __init initmem_init(void) */ nodes_and(node_possible_map, node_possible_map, node_online_map); + node_associativity_setup(); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn;
[PATCH] Staging: greybus: Fix spelling error in comment
Fixed a spelling error. Signed-off-by: Eames Trinh--- drivers/staging/greybus/arche-platform.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/greybus/arche-platform.c b/drivers/staging/greybus/arche-platform.c index 4837aca41389..21ac92d0f533 100644 --- a/drivers/staging/greybus/arche-platform.c +++ b/drivers/staging/greybus/arche-platform.c @@ -196,7 +196,7 @@ static irqreturn_t arche_platform_wd_irq(int irq, void *devid) if (arche_pdata->wake_detect_state == WD_STATE_IDLE) { arche_pdata->wake_detect_start = jiffies; /* -* In the begining, when wake/detect goes low +* In the beginning, when wake/detect goes low * (first time), we assume it is meant for coldboot * and set the flag. If wake/detect line stays low * beyond 30msec, then it is coldboot else fallback -- 2.11.0
Re: L0AN
Do you need a personal/business L0AN, if yes contact Softlink Int'L for more info
[PATCH 0/2] Allow scsi_prep_fn to occur for retried commands
The following two patches address the hang issue being observed with Bart's patch on powerpc. The first patch moves the initialization of jiffies_at_alloc from scsi_init_command to scsi_init_rq, and ensures we don't zero jiffies_at_alloc in scsi_init_command. The second patch saves / restores the retry counter in scsi_init_command which lets us go through scsi_init_command for retries and not forget why we were there. These patches have only been boot tested on my Power machine with ipr to ensure they fix the issue I was seeing. -Brian -- Brian King Power Linux I/O IBM Linux Technology Center
Re: events: possible deadlock in __perf_event_task_sched_out
On Mon, Aug 21, 2017 at 01:58:13PM +0530, Shubham Bansal wrote: > > This is a WARN, printk is a pig. > > So, its not a bug? No, triggering the WARN is the problem, this is just fallout after that.
RE: [PATCH v5 0/3] TPS68470 PMIC drivers
Hi Andy, > > >> > This is the patch series for TPS68470 PMIC that works as a camera PMIC. > > >> > > > >> > The patch series provide the following 3 drivers, to help configure the > voltage regulators, clocks and GPIOs provided by the TPS68470 PMIC, to be > able to use the camera sensors connected to this PMIC. > > >> > > > >> > TPS68470 MFD driver: > > >> > This is the multi function driver that initializes the TPS68470 PMIC > > >> > and > supports the GPIO and Op Region functions. > > >> > > > >> > TPS68470 GPIO driver: > > >> > This is the PMIC GPIO driver that will be used by the OS GPIO layer, > when the BIOS / firmware triggered GPIO access is done. > > >> > > > >> > TPS68470 Op Region driver: > > >> > This is the driver that will be invoked, when the BIOS / firmware > configures the voltage / clock for the sensors / vcm devices connected to the > PMIC. > > >> > > > >> > > >> All three patches are good to me (we did few rounds of internal > > >> review before posting v4) > > >> > > >> Reviewed-by: Andy Shevchenko> > > > > > OK, so how should they be routed? > > > > Good question. I don't know how last time PMIC drivers were merged, > > here I think is just sane to route vi MFD with immutable branch > > created. > > OK > > I will assume that the series will go in through MFD then. > Now that the MFD and GPIO patches of v6 of this series have been applied on respective trees, can you advise the next steps for the ACPI / PMIC Opregion driver? Thanks Raj
Re: [PATCH net-next,1/4] hv_netvsc: Clean up unused parameter from netvsc_get_hash()
All proper patch series must have a header "[PATCH xxx 0/N]" posting which explains at a high level what the patch series does, how it does it, and why it is doing it that way. Therefore, please resubmit this patch series with a proper header posting. Thank you.
Re: [PATCH] perf record: enable multiplexing scaling via -R
On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleenwrote: > > Stephane Eranian writes: > > > > To activate, the user must use: > > $ perf record -a -R > > I don't know why you're overloading the existing raw mode? > > It has nothing to do with that. > I explained this in the changelog. So that is does not change any of the processing in perf report, i.e., no faced with data it does not know how to handle. Also trying to avoid adding yet another option. > > -Andi
RE: [PATCH v5 0/3] TPS68470 PMIC drivers
Hi Rafael, > >> > >> > This is the patch series for TPS68470 PMIC that works as a camera > PMIC. > >> > >> > > >> > >> > The patch series provide the following 3 drivers, to help > >> > >> > configure the > >> voltage regulators, clocks and GPIOs provided by the TPS68470 PMIC, > >> to be able to use the camera sensors connected to this PMIC. > >> > >> > > >> > >> > TPS68470 MFD driver: > >> > >> > This is the multi function driver that initializes the > >> > >> > TPS68470 PMIC and > >> supports the GPIO and Op Region functions. > >> > >> > > >> > >> > TPS68470 GPIO driver: > >> > >> > This is the PMIC GPIO driver that will be used by the OS GPIO > >> > >> > layer, > >> when the BIOS / firmware triggered GPIO access is done. > >> > >> > > >> > >> > TPS68470 Op Region driver: > >> > >> > This is the driver that will be invoked, when the BIOS / > >> > >> > firmware > >> configures the voltage / clock for the sensors / vcm devices > >> connected to the PMIC. > >> > >> > > >> > >> > >> > >> All three patches are good to me (we did few rounds of internal > >> > >> review before posting v4) > >> > >> > >> > >> Reviewed-by: Andy Shevchenko> >> > > > >> > > OK, so how should they be routed? > >> > > >> > Good question. I don't know how last time PMIC drivers were merged, > >> > here I think is just sane to route vi MFD with immutable branch > >> > created. > >> > >> OK > >> > >> I will assume that the series will go in through MFD then. > >> > > > > Now that the MFD and GPIO patches of v6 of this series have been applied > on respective trees, can you advise the next steps for the ACPI / PMIC > Opregion > driver? > > Well, it would have been better to route the whole series through one tree. > Now it's better to wait until the two other trees get merged and then apply > the > opregion patch. > Ack. Let me get back once the other 2 trees are merged. Thanks Raj
Re: [PATCH RESEND 1/2] net: enable high resolution timer mode to timeout datagram sockets
On Fri, Aug 18, 2017 at 11:44 AM, Vallish Vaidyeshwarawrote: > - *timeo_p = schedule_timeout(*timeo_p); > + /* Wait using highres timer */ > + expires = ktime_add_ns(ktime_get(), jiffies_to_nsecs(*timeo_p)); > + pre_sched_time = jiffies; > + if (schedule_hrtimeout(, HRTIMER_MODE_ABS)) Does this work with MAX_SCHEDULE_TIMEOUT too??
Re: [PATCH v3 net-next] bpf/verifier: track liveness for pruning
On 18/08/17 15:16, Edward Cree wrote: > On 18/08/17 04:21, Alexei Starovoitov wrote: >> It seems you're trying to sort-of do per-fake-basic block liveness >> analysis, but our state_list_marks are not correct if we go with >> canonical basic block definition, since we mark the jump insn and >> not insn after the branch and not every basic block boundary is >> properly detected. > I think the reason this works is that jump insns can't do writes. > [snip] > the sl->state will never have any write marks and it'll all just work. > But I should really test that! I tested this, and found that, no, sl->state can have write marks, and the algorithm will get the wrong answer in that case. So I've got a patch to make the first iteration ignore write marks, as part of a series which I will post shortly. When I do so, please re-do your tests with adding state_list_marks in strange and exciting places; it should work wherever you put them. Like you say, it "magically doesn't depend on proper basic block boundaries", and that's because really pruning is just a kind of checkpointing that just happens to be most effective when done just after a jump (pop_stack). Can I have a SOB for your "grr" test program, so I can include it in the series? -Ed