Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-05-31 Thread Christoph Hellwig
On Fri, May 31, 2024 at 12:28:21AM +1000, Michael Ellerman wrote:
> No that's wrong. The actual hardware page size is 4K, but
> CONFIG_PAGE_SIZE and PAGE_SHIFT etc. is 64K.
> 
> So at least for this user the driver used to work with 64K pages, and
> now doesn't.

Which suggested that the communicated max_hw_sectors is wrong, and
previously we were saved by the block layer increasing it to
PAGE_SIZE after a warning.  Should we just increment it to 64k?



Re: [PATCH 04/23] scsi: initialize scsi midlayer limits before allocating the queue

2024-05-20 Thread Christoph Hellwig
Adding ben and the linuxppc list.

Context: pata_macio initialization now fails as we enforce that the
segment size is set properly.

On Wed, May 15, 2024 at 04:52:29PM -0700, Guenter Roeck wrote:
> pata_macio_common_init() Calling ata_host_activate() with limit 65280
> ...
> max_segment_size is 65280; PAGE_SIZE is 65536; BLK_MAX_SEGMENT_SIZE is 65536
> WARNING: CPU: 0 PID: 12 at block/blk-settings.c:202 
> blk_validate_limits+0x2d4/0x364
> ...
>
> This is with PPC_BOOK3S_64 which selects a default page size of 64k.

Yeah.  Did you actually manage to use pata macio previously?  Or is
it just used because it's part of the pmac default config?

> Looking at the old code, I think it did what you suggested above,

> but assuming that the driver requested a lower limit on purpose that
> may not be the best solution.

> Never mind, though - I updated my test configuration to explicitly
> configure the page size to 4k to work around the problem. With that,
> please consider this report a note in case someone hits the problem
> on a real system (and sorry for the noise).

Yes, the idea behind this change was to catch such errors.  So far
most errors have been drivers setting lower limits than what the
hardware can actually handle, but I'd love to track this down.

If the hardware can't actually handle the lower limit we should
probably just fail the probe gracefully with a well comment if
statement instead.


Re: [PATCH 1/4] mm/vmalloc: allow arch-specific vmalloc_node overrides

2024-02-20 Thread Christoph Hellwig
On Tue, Feb 20, 2024 at 02:32:53PM -0600, Maxwell Bland wrote:
> Present non-uniform use of __vmalloc_node and __vmalloc_node_range makes
> enforcing appropriate code and data seperation untenable on certain
> microarchitectures, as VMALLOC_START and VMALLOC_END are monolithic
> while the use of the vmalloc interface is non-monolithic: in particular,
> appropriate randomness in ASLR makes it such that code regions must fall
> in some region between VMALLOC_START and VMALLOC_end, but this
> necessitates that code pages are intermingled with data pages, meaning
> code-specific protections, such as arm64's PXNTable, cannot be
> performantly runtime enforced.

That's not actually true.  We have MODULE_START/END to separate them,
which is used by mips only for now.

> 
> The solution to this problem allows architectures to override the
> vmalloc wrapper functions by enforcing that the rest of the kernel does
> not reimplement __vmalloc_node by using __vmalloc_node_range with the
> same parameters as __vmalloc_node or provides a __weak tag to those
> functions using __vmalloc_node_range with parameters repeating those of
> __vmalloc_node.

I'm really not too happy about overriding the functions.  Especially
as the separation is a generally good idea and it would be good to
move everyone (or at least all modern architectures) over to a scheme
like this.


Re: [RFC PATCH] mm: z3fold: rename CONFIG_Z3FOLD to CONFIG_Z3FOLD_DEPRECATED

2024-01-21 Thread Christoph Hellwig
On Tue, Jan 16, 2024 at 12:19:39PM -0800, Yosry Ahmed wrote:
> Well, better compression ratios for one :)
> 
> I think a long time ago there were complaints that zsmalloc had higher
> latency than zbud/z3fold, but since then a lot of things have changed
> (including nice compaction optimization from Sergey, and compaction
> was one of the main factors AFAICT). Also, recent experiments that
> Chris Li conducted showed that (at least in our setup), the
> decompression is only a small part of the fault latency with zswap
> (i.e. not the main factor) -- so I am not sure if it actually matters
> in practice.
> 
> That said, I have not conducted any experiments personally with z3fold
> or zbud, which is why I proposed the conservative approach of marking
> as deprecated first. However, if others believe this is unnecessary I
> am fine with removal as well. Whatever we agree on is fine by me.

In general deprecated is for code that has active (intentional) users
and/or would break setups.  I does sound to me like that is not the
case here, but others might understand this better.


Re: [RFC PATCH] mm: z3fold: rename CONFIG_Z3FOLD to CONFIG_Z3FOLD_DEPRECATED

2024-01-16 Thread Christoph Hellwig
On Fri, Jan 12, 2024 at 04:38:30PM -0800, Nhat Pham wrote:
> >
> > I thought deprecating z3fold is the low hanging fruit. Then, once we
> > can sort out the MMU dependency in zsmalloc, we can go after zbud as
> > well.
> 
> Makes sense to me. Should we do the same thing to zbud? We probably
> have even less of a case for it, no?

Is there any user visible effect of switching the allocator?  If not it
seems a bit pointless to deprecate them vs just removing them (or maybe
making z3fold depend on !MMU for now).


Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2024-01-08 Thread Christoph Hellwig
On Sun, Jan 07, 2024 at 10:39:07AM +0800, Huacai Chen wrote:
> > Do you mean that LoongArch32 does not support double-precision FP in 
> > hardware?
> > At least both of the consumers in this series use double-precision, so my 
> > first
> > thought is that LoongArch32 could not select ARCH_HAS_KERNEL_FPU_SUPPORT.
> Then is it possible to introduce CC_FLAGS_SP_FPU and CC_FLAGS_DP_FPU?
> I think there may be some place where SP FP is enough.

Let's defer that until it is actually neeed.


Re: [PATCH v2 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-27 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v2 10/14] riscv: Add support for kernel-mode FPU

2023-12-27 Thread Christoph Hellwig
On Wed, Dec 27, 2023 at 05:42:00PM -0800, Samuel Holland wrote:
> This is motivated by the amdgpu DRM driver, which needs floating-point
> code to support recent hardware. That code is not performance-critical,
> so only provide a minimal non-preemptible implementation for now.
> 
> Signed-off-by: Samuel Holland 

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v2 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-27 Thread Christoph Hellwig
Thanks for all the great documentation!

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v2 13/14] selftests/fpu: Move FP code to a separate translation unit

2023-12-27 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v2 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-27 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 3/3] drm/amd/display: Support DRM_AMD_DC_FP on RISC-V

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 10:49:53PM -0600, Samuel Holland wrote:
> Actually tracking all possibly-FPU-tainted functions and their call sites is
> probably possible, but a much larger task.

I think objtool should be able to do that reasonably easily, it already
does it for checking section where userspace address access is enabled
or not, which is very similar.


Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
On Mon, Dec 11, 2023 at 10:12:27AM -0600, Samuel Holland wrote:
> On 2023-12-11 10:07 AM, Christoph Hellwig wrote:
> 
> Unfortunately, not all of the relevant options can be no-prefixed:

Ok.  That is another good argument for having the obj-fpu += syntax
I proposed.  You might need help from the kbuild maintainers from that
as trying to understand the kbuild magic isn't something I'd expect
from a normal contributor (including myself..).



Re: [RFC PATCH 12/12] selftests/fpu: Allow building on other architectures

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 11/12] selftests/fpu: Move FP code to a separate translation unit

2023-12-11 Thread Christoph Hellwig
>  obj-$(CONFIG_TEST_FPU) += test_fpu.o
> -CFLAGS_test_fpu.o += $(FPU_CFLAGS)
> +test_fpu-y := test_fpu_glue.o test_fpu_impl.o
> +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)

Btw, I really wonder if having a

modname-fpu += foo.o

syntax in kbuild wouldn't be preferable to this.  Of coure that requires
someone who understands kbuild inside out.

> +int test_fpu(void);

This needs to go into a header.

And I think I underatand your way to enforce the use of a separate
compilation unit in the riscv patch now.

Can we just make that generic, e.g. have a  that wraps
 that does the guard based on a
-D_LINUX_FPU_COMPILATION_UNIT=1 on the command line so that all the
code becomes fully portable?  Any legacy arch specific fpu users not
using  would not be affected by it, although it would be
great to eventually migrate them to the common scheme.



Re: [RFC PATCH 09/12] riscv: Add support for kernel-mode FPU

2023-12-11 Thread Christoph Hellwig
> +#ifdef __riscv_f
> +
> +#define kernel_fpu_begin() \
> + static_assert(false, "floating-point code must use a separate 
> translation unit")
> +#define kernel_fpu_end() kernel_fpu_begin()
> +
> +#else
> +
> +void kernel_fpu_begin(void);
> +void kernel_fpu_end(void);
> +
> +#endif

I'll assume this is related to trick that places code in a separate
translation unit, but I fail to understand it.  Can you add a comment
explaining it?



Re: [RFC PATCH 08/12] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 07/12] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:37PM -0800, Samuel Holland wrote:
> PowerPC provides an equivalent to the common kernel-mode FPU API, but in
> a different header and using different function names. The PowerPC API
> also requires a non-preemptible context. Add a wrapper header, and
> export the CFLAGS adjustments.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 06/12] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:36PM -0800, Samuel Holland wrote:
> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
> asm/fpu.h, so it only needs to add kernel_fpu_available() and export
> the CFLAGS adjustments.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
> +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
> +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)

Btw, do we even really need the extra variables for compiler flags
to remove?  Don't gcc/clang options work so that if you add a
no-prefixed version of the option later it transparently gets removed?

Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 04/12] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
> + * linux/arch/arm64/include/asm/fpu.h

Same comment as for arm here.  Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 03/12] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS

2023-12-11 Thread Christoph Hellwig
On Thu, Dec 07, 2023 at 09:54:33PM -0800, Samuel Holland wrote:
> Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
> tree, use it instead of duplicating the flags here.

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 02/12] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
> --- /dev/null
> +++ b/arch/arm/include/asm/fpu.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * linux/arch/arm/include/asm/fpu.h

Please don't add the file name to top of the file comments.  It serves
no purpose and easily gets out of date.

Except for that:

Reviewed-by: Christoph Hellwig 


Re: [RFC PATCH 01/12] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT

2023-12-11 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-11-22 Thread Christoph Hellwig
On Wed, Nov 22, 2023 at 10:22:11AM -0500, Peter Xu wrote:
> The other reason I feel like hugepd may or may not be further developed for
> new features like large folio is that I saw Power9 started to shift to
> radix pgtables, and afaics hugepd is only supported in hash tables
> (hugepd_ok()).  But again, I confess I know nothing about Power at all.

That alone sounds like a good reason to not bother.  So unless more
qualified people have a different opinion I'm fine with this patch
as long as you leave a comment in place, and ammend the commit message
with some of the very useful information from your mail.



Re: [PATCH 3/3] drm/amd/display: Support DRM_AMD_DC_FP on RISC-V

2023-11-22 Thread Christoph Hellwig
> - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || 
> (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
> + select DRM_AMD_DC_FP if ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG
> + select DRM_AMD_DC_FP if PPC64 && ALTIVEC
> + select DRM_AMD_DC_FP if RISCV && FPU
> + select DRM_AMD_DC_FP if LOONGARCH || X86

This really is a mess.  Can you add a ARCH_HAS_KERNEL_FPU_SUPPORT
symbol that all architetures that have it select instead, and them
make DRM_AMD_DC_FP depend on it?

> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) || defined(CONFIG_RISCV)
>   kernel_fpu_begin();
>  #elif defined(CONFIG_PPC64)
>   if (cpu_has_feature(CPU_FTR_VSX_COMP))
> @@ -122,7 +124,7 @@ void dc_fpu_end(const char *function_name, const int line)
>  
>   depth = __this_cpu_dec_return(fpu_recursion_depth);
>   if (depth == 0) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) || defined(CONFIG_RISCV)
>   kernel_fpu_end();
>  #elif defined(CONFIG_PPC64)
>   if (cpu_has_feature(CPU_FTR_VSX_COMP))

And then this mess can go away.  We'll need to decide if we want to
cover all the in-kernel vector support as part of it, which would
seem reasonable to me, or have a separate generic kernel_vector_begin
with it's own option.

> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile 
> b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index ea7d60f9a9b4..5c8f840ef323 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -43,6 +43,12 @@ dml_ccflags := -mfpu=64
>  dml_rcflags := -msoft-float
>  endif
>  
> +ifdef CONFIG_RISCV
> +include $(srctree)/arch/riscv/Makefile.isa
> +# Remove V from the ISA string, like in arch/riscv/Makefile, but keep F and 
> D.
> +dml_ccflags := -march=$(shell echo $(riscv-march-y) | sed -E 
> 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
> +endif
> +
>  ifdef CONFIG_CC_IS_GCC
>  ifneq ($(call gcc-min-version, 70100),y)
>  IS_OLD_GCC = 1

And this is again not really something we should be doing.
Instead we need a generic way in Kconfig to enable FPU support
for an object file or set of, that the arch support can hook
into.

Btw, I'm also really worried about folks using the FPU instructions
outside the kernel_fpu_begin/end windows in general (not directly
related to the RISC-V support).  Can we have objecttool checks
for that similar to only allowing the unsafe uaccess in the
uaccess begin/end pairs?


Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-11-22 Thread Christoph Hellwig
On Tue, Nov 21, 2023 at 10:59:35AM -0500, Peter Xu wrote:
> > What prevents us from ever using hugepd with file mappings?  I think
> > it would naturally fit in with how large folios for the pagecache work.
> > 
> > So keeping this check and generalizing it seems like the better idea to
> > me.
> 
> But then it means we're still keeping that dead code for fast-gup even if
> we know that fact..  Or do we have a plan to add that support very soon, so
> this code will be destined to add back?

The question wasn't mean retorical - we support arbitrary power of two
sized folios for the pagepage, what prevents us from using hugepd with
them right now?

> The other option is I can always add a comment above gup_huge_pd()
> explaining this special bit, so that when someone is adding hugepd support
> to file large folios we'll hopefully not forget it?  But then that
> generalization work will only happen when the code will be needed.

If dropping the check is the right thing for now (and I think the ppc
maintainers and willy as the large folio guy might have a more useful
opinions than I do), leaving a comment in would be very useful.



Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

2023-11-20 Thread Christoph Hellwig
On Wed, Nov 15, 2023 at 08:29:02PM -0500, Peter Xu wrote:
> Hugepd format is only used in PowerPC with hugetlbfs.  In commit
> a6e79df92e4a ("mm/gup: disallow FOLL_LONGTERM GUP-fast writing to
> file-backed mappings"), we added a check to fail gup-fast if there's
> potential risk of violating GUP over writeback file systems.  That should
> never apply to hugepd.
>
> Drop that check, not only because it'll never be true for hugepd, but also
> it paves way for reusing the function outside fast-gup.

What prevents us from ever using hugepd with file mappings?  I think
it would naturally fit in with how large folios for the pagecache work.

So keeping this check and generalizing it seems like the better idea to
me.


Re: [PATCH v3 1/1] PCI: layerscape-ep: set 64-bit DMA mask

2023-09-27 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [PATCH v2 1/1] PCI: layerscape-ep: set 64-bit DMA mask

2023-09-26 Thread Christoph Hellwig
> + /* set 64-bit DMA mask and coherent DMA mask */
> + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));

The comment is a bit silly :)

> + if (ret)
> + return ret;

Also no need to check the return value when setting a 64-bit mask,
but I guess it desn't hurt here.



Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

2023-08-11 Thread Christoph Hellwig
On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote:
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.
> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.

And we need to fix case 2.  We need to drop the is_kdump support, and
if they want to force less cpus they need to make nr_cpus=1 work.



Re: [PATCH] dma-mapping: move arch_dma_set_mask() declaration to header

2023-07-31 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping tree for 6.6.


Re: [PATCH 2/2] PCI/AER: Unexport pci_enable_pcie_error_reporting()

2023-07-10 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 1/2] PCI/AER: Drop unused pci_disable_pcie_error_reporting()

2023-07-10 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-07-06 Thread Christoph Hellwig
> Thanks for your patch, which is now commit 322dbe898f82fd8a
> ("ARM: dma-mapping: split out arch_dma_mark_clean() helper") in
> esmil/jh7100-dmapool.

Well, something is wrong with that branch then, and this series still
needs more work, and should eventually be merged through the dma-mapping
tree.


Re: WARNING at fs/btrfs/disk-io.c:275 during xfstests run(4k block)

2023-05-29 Thread Christoph Hellwig
This should be fixed by

"btrfs: fix the uptodate assert in btree_csum_one_bio"

on the list.


Re: [PATCH v5 RESEND 15/17] powerpc: mm: Convert to GENERIC_IOREMAP

2023-05-17 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v5 RESEND 01/17] asm-generic/iomap.h: remove ARCH_HAS_IOREMAP_xx macros

2023-05-17 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v5 5/6] mm/gup: remove vmas parameter from pin_user_pages()

2023-05-15 Thread Christoph Hellwig
On Sun, May 14, 2023 at 10:26:58PM +0100, Lorenzo Stoakes wrote:
> We are now in a position where no caller of pin_user_pages() requires the
> vmas parameter at all, so eliminate this parameter from the function and
> all callers.
> 
> This clears the way to removing the vmas parameter from GUP altogether.
> 
> Acked-by: David Hildenbrand 
> Acked-by: Dennis Dalessandro  (for 
> qib)
> Signed-off-by: Lorenzo Stoakes 

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v4 0/3] Use dma_default_coherent for devicetree default coherency

2023-04-06 Thread Christoph Hellwig
Thanks,

applied to the dma-mapping tree for 6.4.


Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-27 Thread Christoph Hellwig
On Fri, Mar 24, 2023 at 09:17:38AM +, Jiaxun Yang wrote:
> > 
> > Is patch a 6.3 candidate or should all of it go into 6.4?
> 
> Please leave it for 6.4, as corresponding MIPS arch part will be a part of 
> 6.4.

Ok.  I'll really need review from the MIPS and drivers/of/ maintainers,
through.


Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation

2023-03-27 Thread Christoph Hellwig
> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size)
>  {
> + dma_cache_wback(paddr, size);
> +}
>  
> +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size)
> +{
> + dma_cache_inv(paddr, size);
>  }

> +static inline void arch_dma_cache_wback_inv(phys_addr_t paddr, size_t size)
>  {
> + dma_cache_wback_inv(paddr, size);
> +}

There are the only calls for the three functions for each of the
involved functions.  So I'd rather rename the low-level symbols
(and drop the pointless exports for two of them) rather than adding
these wrapppers.

The same is probably true for many other architectures.

> +static inline bool arch_sync_dma_clean_before_fromdevice(void)
> +{
> + return false;
> +}
>  
> +static inline bool arch_sync_dma_cpu_needs_post_dma_flush(void)
> +{
> + return true;
>  }

Is there a way to cut down on this boilerplate code by just having
sane default, and Kconfig options to override them if they are not
runtime decisions?

> +#include 

I can't really say I like the #include version here despite your
rationale in the commit log.  I can probably live with it if you
think it is absolutely worth it, but I'm really not in favor of it.

> +config ARCH_DMA_MARK_DCACHE_CLEAN
> + def_bool y

What do we need this symbol for?  Unless I'm missing something it is
always enable for arm32, and only used in arm32 code.


Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-23 Thread Christoph Hellwig
On Thu, Mar 23, 2023 at 09:07:31PM +, Jiaxun Yang wrote:
> 
> 
> > 2023年3月23日 07:29,Christoph Hellwig  写道:
> > 
> > The series looks fine to me.  How should we merge it?
> 
> Perhaps go through dma-mapping tree?

Is patch a 6.3 candidate or should all of it go into 6.4?


Re: [PATCH v3 0/4] Use dma_default_coherent for devicetree default coherency

2023-03-23 Thread Christoph Hellwig
The series looks fine to me.  How should we merge it?


Re: [PATCH v2 5/5] of: address: Always use dma_default_coherent for default coherency

2023-03-01 Thread Christoph Hellwig
> - select OF_DMA_DEFAULT_COHERENT  if !NOT_COHERENT_CACHE

Doesn't powerpc need to select CONFIG_ARCH_DMA_DEFAULT_COHERENT now,
or even better should be doing that in the patch adding that
symbol?

In fact I wonder if adding CONFIG_ARCH_DMA_DEFAULT_COHERENT and removing
OF_DMA_DEFAULT_COHERENT should be one patch, as that seems to bring
over the intent a little better I'd say.


Re: [PATCH v2 3/5] dma-mapping: Provide CONFIG_ARCH_DMA_DEFAULT_COHERENT

2023-03-01 Thread Christoph Hellwig
> +config ARCH_DMA_DEFAULT_COHERENT
> + bool

Please add a comment here explaining when an architecture should
ѕelect this symbol.


Re: [PATCH v2 2/5] dma-mapping: Provide a fallback dma_default_coherent

2023-03-01 Thread Christoph Hellwig
On Thu, Feb 23, 2023 at 11:36:41AM +, Jiaxun Yang wrote:
> dma_default_coherent was decleared unconditionally at kernel/dma/mapping.c
> but only decleared when any of non-coherent options is enabled in 
> dma-map-ops.h.

Overly lone line here.

>  }
>  #else
> +#define dma_default_coherent true
>  static inline bool dev_is_dma_coherent(struct device *dev)

Please keep an empty line before the function declaration.


Re: [PATCH 0/7] MIPS DMA coherence fixes

2023-02-21 Thread Christoph Hellwig
Can you explain the motivation here?  Also why riscv patches are at
the end of a mips fіxes series?


Re: [PATCH 5/7] dma-mapping: Provide CONFIG_ARCH_DMA_DEFAULT_COHERENT

2023-02-21 Thread Christoph Hellwig
On Tue, Feb 21, 2023 at 12:46:11PM +, Jiaxun Yang wrote:
> Provide a kconfig option to allow arches to manipulate default
> value of dma_default_coherent in Kconfig.

I don't see the win over just doing this by setting dma_default_coherent
in the early boot code.


Re: [PATCH 4/7] dma-mapping: Always provide dma_default_coherent

2023-02-21 Thread Christoph Hellwig
On Tue, Feb 21, 2023 at 12:46:10PM +, Jiaxun Yang wrote:
> dma_default_coherent can be useful for determine default coherency
> even on arches without noncoherent support.

How?


Re: [PATCH 1/7] MIPS: Remove DMA_PERDEV_COHERENT

2023-02-21 Thread Christoph Hellwig
On Tue, Feb 21, 2023 at 12:46:07PM +, Jiaxun Yang wrote:
> As now we are always managing DMA coherence on per dev bias,
> there is no need to have such option. And it's not selected
> by any platform.

I think the real point here is that this is dead code, so it can
obviously go away, but:

>  config MIPS_GENERIC_KERNEL
>   bool "Generic board-agnostic MIPS kernel"
> - select ARCH_HAS_SETUP_DMA_OPS
>   select MIPS_GENERIC
>   select BOOT_RAW
>   select BUILTIN_DTB
> @@ -1079,11 +1078,6 @@ config FW_CFE
>  config ARCH_SUPPORTS_UPROBES
>   bool

> @@ -1097,6 +1091,7 @@ config DMA_NONCOHERENT
>   select ARCH_HAS_DMA_PREP_COHERENT
>   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>   select ARCH_HAS_DMA_SET_UNCACHED
> + select ARCH_HAS_SETUP_DMA_OPS

This is an unrelated und undocument change.  If you want to do it,
please do that as a separate patch with a commit log documenting
the rationale.


Re: [RFC PATCH 01/19] mm: Introduce vm_account

2023-01-23 Thread Christoph Hellwig
> +/**
> + * vm_account_init - Initialise a new struct vm_account.
> + * @vm_account: pointer to uninitialised vm_account.
> + * @task: task to charge against.
> + * @user: user to charge against. Must be non-NULL for VM_ACCOUNT_USER.
> + * @flags: flags to use when charging to vm_account.
> + *
> + * Initialise a new uninitialiused struct vm_account. Takes references
> + * on the task/mm/user/cgroup as required although callers must ensure
> + * any references passed in remain valid for the duration of this
> + * call.
> + */
> +void vm_account_init(struct vm_account *vm_account, struct task_struct *task,
> + struct user_struct *user, enum vm_account_flags flags);


kerneldoc comments are supposed to be next to the implementation, and
not the declaration in the header.



[PATCH] ps3vram: remove bio splitting

2023-01-22 Thread Christoph Hellwig
ps3vram iterates over the bio one segment, that is page aligned and max
page sized chunk, a time.  Because of that there is no point in
calling bio_split_to_limits, or explicitly setting the default limits
that are only used by bio_split_to_limits.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/ps3vram.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index 574e470b220b09..38d42af01b2535 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -586,10 +586,6 @@ static void ps3vram_submit_bio(struct bio *bio)
 
dev_dbg(>core, "%s\n", __func__);
 
-   bio = bio_split_to_limits(bio);
-   if (!bio)
-   return;
-
spin_lock_irq(>lock);
busy = !bio_list_empty(>list);
bio_list_add(>list, bio);
@@ -749,9 +745,6 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
gendisk->private_data = dev;
strscpy(gendisk->disk_name, DEVICE_NAME, sizeof(gendisk->disk_name));
set_capacity(gendisk, priv->size >> 9);
-   blk_queue_max_segments(gendisk->queue, BLK_MAX_SEGMENTS);
-   blk_queue_max_segment_size(gendisk->queue, BLK_MAX_SEGMENT_SIZE);
-   blk_queue_max_hw_sectors(gendisk->queue, BLK_SAFE_MAX_SECTORS);
 
dev_info(>core, "%s: Using %llu MiB of GPU memory\n",
 gendisk->disk_name, get_capacity(gendisk) >> 11);
-- 
2.39.0



Re: [PATCH] mm: remove zap_page_range and create zap_vma_pages

2023-01-08 Thread Christoph Hellwig
I would have split this into one patch that adds the new
zap_vma_pages helper, and one to remove zap_page_range to split the
separate changes.

But the overall result looks fine, so feel free to add:

Reviewed-by: Christoph Hellwig 

to this or the split patches.


Re: [RFC PATCH] mm: remove zap_page_range and change callers to use zap_vma_page_range

2022-12-23 Thread Christoph Hellwig
>   unsigned long size = vma->vm_end - vma->vm_start;
>  
>   if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA64].dm))
> - zap_page_range(vma, vma->vm_start, size);
> + zap_vma_page_range(vma, vma->vm_start, size);
>  #ifdef CONFIG_COMPAT_VDSO
>   if (vma_is_special_mapping(vma, vdso_info[VDSO_ABI_AA32].dm))
> - zap_page_range(vma, vma->vm_start, size);
> + zap_vma_page_range(vma, vma->vm_start, size);
>  #endif

So for something called zap_vma_page_range I'd expect to just pass
the vma and zap all of it, which this and many other callers want
anyway.

> +++ b/arch/s390/mm/gmap.c
> @@ -722,7 +722,7 @@ void gmap_discard(struct gmap *gmap, unsigned long from, 
> unsigned long to)
>   if (is_vm_hugetlb_page(vma))
>   continue;
>   size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
> - zap_page_range(vma, vmaddr, size);
> + zap_vma_page_range(vma, vmaddr, size);

And then just call zap_page_range_single directly for those that
don't want to zap the entire vma.


[PATCH] powerpc/ps3: mark ps3_system_bus_type static

2022-11-21 Thread Christoph Hellwig
ps3_system_bus_type is only used inside of system-bus.c, so remove
the external declaration and the very outdated comment next to it.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/ps3.h  | 4 
 arch/powerpc/platforms/ps3/system-bus.c | 2 +-
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/ps3.h b/arch/powerpc/include/asm/ps3.h
index 8a0d8fb3532863..d503dbd7856cb8 100644
--- a/arch/powerpc/include/asm/ps3.h
+++ b/arch/powerpc/include/asm/ps3.h
@@ -425,10 +425,6 @@ static inline void *ps3_system_bus_get_drvdata(
return dev_get_drvdata(>core);
 }
 
-/* These two need global scope for get_arch_dma_ops(). */
-
-extern struct bus_type ps3_system_bus_type;
-
 /* system manager */
 
 struct ps3_sys_manager_ops {
diff --git a/arch/powerpc/platforms/ps3/system-bus.c 
b/arch/powerpc/platforms/ps3/system-bus.c
index 2502e9b17df4ab..38a7e02295c8f2 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -466,7 +466,7 @@ static struct attribute *ps3_system_bus_dev_attrs[] = {
 };
 ATTRIBUTE_GROUPS(ps3_system_bus_dev);
 
-struct bus_type ps3_system_bus_type = {
+static struct bus_type ps3_system_bus_type = {
.name = "ps3_system_bus",
.match = ps3_system_bus_match,
.uevent = ps3_system_bus_uevent,
-- 
2.30.2



Re: [PATCH] mm: remove kern_addr_valid() completely

2022-10-18 Thread Christoph Hellwig
Thanks, this is long overdue!

Acked-by: Christoph Hellwig 


Re: [PATCH v3 4/5] vfio: Remove CONFIG_VFIO_SPAPR_EEH

2022-10-18 Thread Christoph Hellwig
> +#if IS_ENABLED(CONFIG_EEH) && IS_ENABLED(CONFIG_VFIO_IOMMU_SPAPR_TCE)
>  #include 
>  #endif
>  
> @@ -689,7 +689,7 @@ void vfio_pci_core_close_device(struct vfio_device 
> *core_vdev)
>   vdev->sriov_pf_core_dev->vf_token->users--;
>   mutex_unlock(>sriov_pf_core_dev->vf_token->lock);
>   }
> -#if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH)
> +#if IS_ENABLED(CONFIG_EEH) && IS_ENABLED(CONFIG_VFIO_IOMMU_SPAPR_TCE)

So while this preserves the existing behavior, I wonder if checking
CONFIG_EEH only would make more sense here.



Re: [PATCH] powerpc: export cpu_smallcore_map for modules

2022-08-22 Thread Christoph Hellwig
On Mon, Aug 22, 2022 at 01:40:23PM +1000, Michael Ellerman wrote:
> Randy Dunlap  writes:
> > drivers/gpu/drm/amd/amdkfd/kfd_device.c calls cpu_smt_mask().
> > This is an inline function on powerpc which references
> > cpu_smallcore_map.
> >
> > Fixes: 425752c63b6f ("powerpc: Detect the presence of big-cores via "ibm, 
> > thread-groups"")
> > Fixes: 7bc913085765 ("drm/amdkfd: Try to schedule bottom half on same core")
> 
> That 2nd commit is not in mainline, only linux-next.
> 
> I don't mind merging this fix preemptively, but is that SHA stable?

I really do not think this has any business being exported at all.

kfd_queue_work is not something that should be done in a driver.
Something like this belongs into the workqueue core, not in an
underdocumented helper in a random driver.

Drm guys:  once again, please please work with the maintainers instead
of just making up random stuff in the drivers.


Re: [PATCH v8 00/31] Rust support

2022-08-02 Thread Christoph Hellwig
Miguel,

handwaiving and pointing to git trees is not how Linux development
works.  Please make sure all the patches go to the relevant lists
and maintainers first, and actually do have ACKs.


Re: [PATCH v4 1/3] PCI: Remove pci_get_legacy_ide_irq and asm-generic/pci.h

2022-07-20 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v4 2/3] PCI: Move isa_dma_bridge_buggy out of dma.h

2022-07-20 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v2 1/2] asm-generic: Remove pci.h copying remaining code to x86

2022-07-17 Thread Christoph Hellwig
On Sun, Jul 17, 2022 at 12:34:52PM +0900, Stafford Horne wrote:
> The generic pci.h header now only provides a definition of
> pci_get_legacy_ide_irq which is used by architectures that support PNP.
> Of the architectures that use asm-generic/pci.h this is only x86.

Please move this into a separate header, ike legacy-ide.h.  It doens't
have anyting to do with actual PCI support.


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-29 Thread Christoph Hellwig
On Wed, Jun 29, 2022 at 09:38:00AM +1200, Michael Schmitz wrote:
> That's one of the 'liberties' I alluded to. The reason I left these in is
> that I'm none too certain what device feature the DMA API uses to decide a
> device isn't cache-coherent.

The DMA API does not look at device features at all.  It needs to be
told so by the platform code.  Once an architecture implements the
hooks to support non-coherent DMA all devices are treated as
non-coherent by default unless overriden by the architecture either
globally (using the global dma_default_coherent variable) or per-device
(using the dev->dma_coherent field, usually set by arch_setup_dma_ops).

> If it's dev->coherent_dma_mask, the way I set
> up the device in the a3000 driver should leave the coherent mask unchanged.
> For the Zorro drivers, devices are set up to use the same storage to store
> normal and coherent masks - something we most likely want to change. I need
> to think about the ramifications of that.

No, the coherent mask is slightly misnamed amd not actually related.


Re: [PATCH v2 3/3] arch/*/: remove CONFIG_VIRT_TO_BUS

2022-06-29 Thread Christoph Hellwig
On Wed, Jun 29, 2022 at 11:09:00AM +1200, Michael Schmitz wrote:
> And all SCSI buffers are allocated using kmalloc? No way at all for user
> space to pass unaligned data?

Most that you will see actually comes from the page allocator.  But
the block layer has a dma_alignment limit, and when userspace sends
I/O that is not properly aligned it will be bounce buffered before
it it sent to the driver.


Re: [PATCH v2 2/4] watchdog: export watchdog_mutex and lockup_detector_reconfigure

2022-06-24 Thread Christoph Hellwig
On Tue, Jun 14, 2022 at 03:54:12PM +0200, Laurent Dufour wrote:
> The watchdog_mutex is exported to allow some variable to be changed under
> its protection and prevent any conflict.
> The lockup_detector_reconfigure() function is exported and is expected to
> be called under the protection of watchdog_mutex.

Please provide an actual function accessor instead of directly touching
a global lock.


Re: [PATCH V4 00/26] mm/mmap: Drop __SXXX/__PXXX macros from across platforms

2022-06-23 Thread Christoph Hellwig
On Fri, Jun 24, 2022 at 10:50:33AM +0530, Anshuman Khandual wrote:
> > On most architectures this should be const now, only very few ever
> > modify it.
> 
> Will make it a 'static const pgprot_t protection_map[16] __ro_after_init'
> on platforms that do not change the protection_map[] even during boot.

No need for __ro_after_init when it is already declarated const.


Re: [PATCH V4 00/26] mm/mmap: Drop __SXXX/__PXXX macros from across platforms

2022-06-23 Thread Christoph Hellwig
On Fri, Jun 24, 2022 at 10:13:13AM +0530, Anshuman Khandual wrote:
> vm_get_page_prot(), in order for it to be reused on platforms that do not
> require custom implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just
> be dropped, as all platforms now define and export vm_get_page_prot(), via
> looking up a private and static protection_map[] array. protection_map[]
> data type is the following for all platforms without deviation (except the
> powerpc one which is shared between 32 and 64 bit platforms), keeping it
> unchanged for now.
> 
> static pgprot_t protection_map[16] __ro_after_init

On most architectures this should be const now, only very few ever
modify it.


Re: [PATCH V4 26/26] mm/mmap: Drop ARCH_HAS_VM_GET_PAGE_PROT

2022-06-23 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH V4 16/26] riscv/mm: Enable ARCH_HAS_VM_GET_PAGE_PROT

2022-06-23 Thread Christoph Hellwig
On Fri, Jun 24, 2022 at 10:13:29AM +0530, Anshuman Khandual wrote:
index d466ec670e1f..f976580500b1 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -288,6 +288,26 @@ static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] 
> __initdata __aligned(PAG
>  #define early_pg_dir   ((pgd_t *)XIP_FIXUP(early_pg_dir))
>  #endif /* CONFIG_XIP_KERNEL */
>  
> +static pgprot_t protection_map[16] __ro_after_init = {

Can't this be marked const now?



Re: [PATCH V4 06/26] x86/mm: Move protection_map[] inside the platform

2022-06-23 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH V4 02/26] mm/mmap: Define DECLARE_VM_GET_PAGE_PROT

2022-06-23 Thread Christoph Hellwig
On Fri, Jun 24, 2022 at 10:13:15AM +0530, Anshuman Khandual wrote:
> This just converts the generic vm_get_page_prot() implementation into a new
> macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across platforms
> when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does not create any
> functional change.

mm.h is a huhe header included by almost everything in the kernel.
I'd rather have it in something only included in a few files.  If we
can't find anything suitable it might be worth to add a header just
for this even.


Re: [PATCH V4 01/26] mm/mmap: Build protect protection_map[] with __P000

2022-06-23 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH] arch/*: Disable softirq stacks on PREEMPT_RT.

2022-06-15 Thread Christoph Hellwig
On Tue, Jun 14, 2022 at 08:18:14PM +0200, Sebastian Andrzej Siewior wrote:
> Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and

s/safe/save/


Re: [PATCHv2 1/2] mm: eliminate ifdef of HAVE_IOREMAP_PROT in .c files

2022-06-15 Thread Christoph Hellwig
Did you verify that all architectures actually provide a ioremap_prot
prototype?
The header situation for ioremap* is a mess unfortunately.


Re: [PATCHv2 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-15 Thread Christoph Hellwig
As pointed out last time:  uio is the wrong interface to expose sram,
and any kind of ioremap is the wrong way to map it.


Re: [PATCH 2/2] uio:powerpc:mpc85xx: l2-cache-sram uio driver implementation

2022-06-14 Thread Christoph Hellwig
UIO seems like the wrong kind of interface for this.  Why isn't this
a simple character device?


Re: [PATCH 1/2] powerpc:mm: export symbol ioremap_coherent

2022-06-14 Thread Christoph Hellwig
On Tue, Jun 14, 2022 at 08:45:25PM +1000, Michael Ellerman wrote:
> Wang Wenhu  writes:
> > The function ioremap_coherent may be called by modules such as
> > fsl_85xx_cache_sram. So export it for access in other modules.
> 
> ioremap_coherent() is powerpc specific, and only has one other caller,
> I'd like to remove it.
> 
> Does ioremap_cache() work for you?

Chances are that both are the wrong thing and this really wants
memremap, as SRAM tends to have memory and not MMIO semantics.


Re: [PATCH RFC v1 4/7] swiotlb: to implement io_tlb_high_mem

2022-06-13 Thread Christoph Hellwig
On Fri, Jun 10, 2022 at 02:56:08PM -0700, Dongli Zhang wrote:
> Since this patch file has 200+ lines, would you please help clarify what does
> 'this' indicate?

This indicates that any choice of a different swiotlb pools needs to
be hidden inside of ѕwiotlb.  The dma mapping API already provides
swiotlb the addressability requirement for the device.  Similarly we
already have a SWIOTLB_ANY flag that switches to a 64-bit buffer
by default, which we can change to, or replace with a flag that
allocates an additional buffer that is not addressing limited.


Re: [PATCH] kprobes: Enable tracing for mololithic kernel images

2022-06-08 Thread Christoph Hellwig
s...@kernel.org>, Will Deacon , Masahiro Yamada 
, Jarkko Sakkinen , Sami Tolvanen 
, "Naveen N. Rao" , Marco 
Elver , Kees Cook , Steven Rostedt 
, Nathan Chancellor , "Russell King 
\(Oracle\)" , Mark Brown , 
Borislav Petkov , Alexander Egorenkov , 
Thomas Bogendoerfer , Parisc List 
, Nathaniel McCallum , 
Dmitry Torokhov , "David S. Miller" 
, "Kirill A. Shutemov" , 
Tobias Huschle , "Peter Zijlstra \(Intel\)" 
, "H. Peter Anvin" , sparclinux 
, Tiezhu Yang , Miroslav Benes , Chen Zhongjin 
, Ard Biesheuvel , the arch/x86 
maintainers , Russell King , 
linux-riscv , Ingo Molnar , 
Aaron Tomlin , Albert Ou , Heiko 
Carstens , Liao Chang , Paul 
Walmsley , Josh Poimboeuf , 
Thomas Richter , "open list:BROADCOM NVRAM DRIVER" 
, Changbin Du , Palmer 
Dabbelt , linuxppc-dev , 
linux-modu...@vger.kernel.org
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 


On Wed, Jun 08, 2022 at 01:26:19PM -0700, Luis Chamberlain wrote:
> No, that was removed because it has only one user.

That is only part of the story.  The other part is that the overall
kernel simply does not have any business allocating exutable memory.
Executable memory is a very special concept for modules or module-like
code like kprobes, and should not be exposed as a general concept.

Especially as executable memory really should not also be writable
for security reasons.  In other words, we should actually never
allocate executable memory, every.  We might seal memory and then
mark it executable after having written to it, which is how modules
and kprobes are implemented on all modern Linux ports anyway.


Re: [PATCH RFC v1 3/7] swiotlb-xen: support highmem for xen specific code

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:49PM -0700, Dongli Zhang wrote:
> @@ -109,19 +110,25 @@ int xen_swiotlb_fixup(void *buf, unsigned long nslabs, 
> bool high)
>   int rc;
>   unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
>   unsigned int i, dma_bits = order + PAGE_SHIFT;
> + unsigned int max_dma_bits = MAX_DMA32_BITS;
>   dma_addr_t dma_handle;
>   phys_addr_t p = virt_to_phys(buf);
>  
>   BUILD_BUG_ON(IO_TLB_SEGSIZE & (IO_TLB_SEGSIZE - 1));
>   BUG_ON(nslabs % IO_TLB_SEGSIZE);
>  
> + if (high) {
> + dma_bits = MAX_DMA64_BITS;
> + max_dma_bits = MAX_DMA64_BITS;
> + }
> +

I think you really want to pass the addressing bits or mask to the
remap callback and not do magic with a 'high' flag here.


Re: [PATCH RFC v1 7/7] swiotlb: fix the slot_addr() overflow

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:53PM -0700, Dongli Zhang wrote:
> +#define slot_addr(start, idx)((start) + \
> + (((unsigned long)idx) << IO_TLB_SHIFT))

Please just convert it to an inline function.


Re: [PATCH RFC v1 6/7] virtio: use io_tlb_high_mem if it is active

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:52PM -0700, Dongli Zhang wrote:
>  /* Unique numbering for virtio devices. */
> @@ -241,6 +243,12 @@ static int virtio_dev_probe(struct device *_d)
>   u64 device_features;
>   u64 driver_features;
>   u64 driver_features_legacy;
> + struct device *parent = dev->dev.parent;
> + u64 dma_mask = min_not_zero(*parent->dma_mask,
> + parent->bus_dma_limit);
> +
> + if (dma_mask == DMA_BIT_MASK(64))
> + swiotlb_use_high(parent);

The driver already very clearly communicated its addressing
requirements.  The underlying swiotlb code needs to transparently
pick the right pool.



Re: [PATCH RFC v1 5/7] swiotlb: add interface to set dev->dma_io_tlb_mem

2022-06-08 Thread Christoph Hellwig
This should be handled under the hood without the driver even knowing.


Re: [PATCH RFC v1 4/7] swiotlb: to implement io_tlb_high_mem

2022-06-08 Thread Christoph Hellwig
All this really needs to be hidden under the hood.


Re: [PATCH RFC v1 1/7] swiotlb: introduce the highmem swiotlb buffer

2022-06-08 Thread Christoph Hellwig
On Wed, Jun 08, 2022 at 05:55:47PM -0700, Dongli Zhang wrote:
> @@ -109,6 +109,7 @@ struct io_tlb_mem {
>   } *slots;
>  };
>  extern struct io_tlb_mem io_tlb_default_mem;
> +extern struct io_tlb_mem io_tlb_high_mem;

Tis should not be exposed.

> +extern bool swiotlb_high_active(void);

And this should not even exist.

> +static unsigned long high_nslabs;

And I don't think "high" is a good name here to start with.  That
suggests highmem, which we are not using here.


Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2022 at 11:11:57AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 01, 2022 at 07:57:43PM +0200, Christoph Hellwig wrote:
> > On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> > > On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > > > Can you send me the full dmesg and the content of
> > > > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> > > 
> > > Sure thing, they are attached! If there is anything else I can provide
> > > or test, I am more than happy to do so.
> > 
> > Nothing interesting.  But the performance numbers almost look like
> > swiotlb=force got ignored before (even if I can't explain why).
> 
> I was able to get my performance back with this diff but I don't know if
> this is a hack or a proper fix in the context of the series.

This looks good, but needs a little tweak.  I'd go for this variant of
it:


diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dfa1de89dc944..cb50f8d383606 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -192,7 +192,7 @@ void __init swiotlb_update_mem_attributes(void)
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
-   unsigned long nslabs, bool late_alloc)
+   unsigned long nslabs, unsigned int flags, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
@@ -203,8 +203,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
mem->index = 0;
mem->late_alloc = late_alloc;
 
-   if (swiotlb_force_bounce)
-   mem->force_bounce = true;
+   mem->force_bounce = swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
@@ -275,8 +274,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
-   mem->force_bounce = flags & SWIOTLB_FORCE;
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -348,7 +346,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
set_memory_decrypted((unsigned long)vstart,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, 0, true);
 
swiotlb_print_info();
return 0;
@@ -835,8 +833,8 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 
set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
 rmem->size >> PAGE_SHIFT);
-   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
-   mem->force_bounce = true;
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, SWIOTLB_FORCE,
+   false);
mem->for_alloc = true;
 
rmem->priv = mem;



Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
On Wed, Jun 01, 2022 at 10:46:54AM -0700, Nathan Chancellor wrote:
> On Wed, Jun 01, 2022 at 07:34:41PM +0200, Christoph Hellwig wrote:
> > Can you send me the full dmesg and the content of
> > /sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?
> 
> Sure thing, they are attached! If there is anything else I can provide
> or test, I am more than happy to do so.

Nothing interesting.  But the performance numbers almost look like
swiotlb=force got ignored before (even if I can't explain why).
Do you get a similar performance with the new kernel without
swiotlb=force as the old one with that argument by any chance?



Re: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-06-01 Thread Christoph Hellwig
Can you send me the full dmesg and the content of
/sys/kernel/debug/swiotlb/io_tlb_nslabs for a good and a bad boot?

Thanks!


[PATCH] PCI/ERR: handle disconnected devices in report_error_detected

2022-06-01 Thread Christoph Hellwig
When a device is already unplugged by pciehp by the time that the AER
handler is invoked, the PCIe device will lready be in the
pci_channel_io_perm_failure state.  In that case we should simply
return PCI_ERS_RESULT_DISCONNECT instead of trying to do a state
transition that will fail.

Also untangle the state transition failure from the lack of methods to
improve the debugging output in case it will happen ever again.

Signed-off-by: Christoph Hellwig 
---
 drivers/pci/pcie/err.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af4..59c90d04a609a 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -55,10 +55,14 @@ static int report_error_detected(struct pci_dev *dev,
 
device_lock(>dev);
pdrv = dev->driver;
-   if (!pci_dev_set_io_state(dev, state) ||
-   !pdrv ||
-   !pdrv->err_handler ||
-   !pdrv->err_handler->error_detected) {
+   if (pci_dev_is_disconnected(dev)) {
+   vote = PCI_ERS_RESULT_DISCONNECT;
+   } else if (!pci_dev_set_io_state(dev, state)) {
+   pci_info(dev, "can't recover (state transition %u -> %u 
invalid)\n",
+   dev->error_state, state);
+   vote = PCI_ERS_RESULT_NONE;
+   } else if (!pdrv || !pdrv->err_handler ||
+  !pdrv->err_handler->error_detected) {
/*
 * If any device in the subtree does not have an error_detected
 * callback, PCI_ERS_RESULT_NO_AER_DRIVER prevents subsequent
-- 
2.30.2



Re: [PATCH] powerpc/Kconfig: Force THREAD_SHIFT to at least 14 with KASAN

2022-05-31 Thread Christoph Hellwig
On Tue, May 31, 2022 at 04:16:19PM +1000, Michael Ellerman wrote:
> I was thinking of doing it in C, similar to the way arm64 handles it.
> 
> Something like below. It means we always double the stack size when
> KASAN is enabled. I think it's preferable, as it will always work
> regardless of whether the user has an old .config (or is bisectting)?

Is there any reason to even offer the Kconfig?  It is super cryptic and
just picking the right value directly in the header would seem much
more sensible:

#if defined(CONFIG_PPC_256K_PAGES)
#define MIN_THREAD_SHIFT15
#elif defined(CONFIG_PPC64)
#define MIN_THREAD_SHIFT14
#else
#define MIN_THREAD_SHIFT13
#endif

#ifdef CONFIG_KASAN
#define THREAD_SHIFT(MIN_THREAD_SHIFT + 1)
#else
#define THREAD_SHIFTMIN_THREAD_SHIFT
#endif

#if defined(CONFIG_VMAP_STACK) && THREAD_SHIFT < PAGE_SHIFT
#undef THREAD_SHIFT
#define THREAD_SHIFTPAGE_SHIFT
#endif


Re: [PATCH 0/5] kallsyms: make kallsym APIs more safe with scnprintf

2022-05-22 Thread Christoph Hellwig
On Fri, May 20, 2022 at 02:06:56PM +0530, Maninder Singh wrote:
> kallsyms functionality depends on KSYM_NAME_LEN directly.
> but if user passed array length lesser than it, sprintf
> can cause issues of buffer overflow attack.
> 
> So changing *sprint* and *lookup* APIs in this patch set
> to have buffer size as an argument and replacing sprintf with
> scnprintf.

This is still a pretty horrible API.  Passing something like
a struct seq_buf seems like the much better API here.  Also with
the amount of arguments and by reference passing it might be worth
to pass them as a structure while you're at it.



[PATCH] net: unexport csum_and_copy_{from,to}_user

2022-04-21 Thread Christoph Hellwig
csum_and_copy_from_user and csum_and_copy_to_user are exported by
a few architectures, but not actually used in modular code.  Drop
the exports.

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/lib/csum_partial_copy.c   | 1 -
 arch/m68k/lib/checksum.c | 2 --
 arch/powerpc/lib/checksum_wrappers.c | 2 --
 arch/x86/lib/csum-wrappers_64.c  | 2 --
 4 files changed, 7 deletions(-)

diff --git a/arch/alpha/lib/csum_partial_copy.c 
b/arch/alpha/lib/csum_partial_copy.c
index 1931a04af85a2..4d180d96f09e4 100644
--- a/arch/alpha/lib/csum_partial_copy.c
+++ b/arch/alpha/lib/csum_partial_copy.c
@@ -353,7 +353,6 @@ csum_and_copy_from_user(const void __user *src, void *dst, 
int len)
return 0;
return __csum_and_copy(src, dst, len);
 }
-EXPORT_SYMBOL(csum_and_copy_from_user);
 
 __wsum
 csum_partial_copy_nocheck(const void *src, void *dst, int len)
diff --git a/arch/m68k/lib/checksum.c b/arch/m68k/lib/checksum.c
index 7e6afeae62177..5acb821849d30 100644
--- a/arch/m68k/lib/checksum.c
+++ b/arch/m68k/lib/checksum.c
@@ -265,8 +265,6 @@ csum_and_copy_from_user(const void __user *src, void *dst, 
int len)
return sum;
 }
 
-EXPORT_SYMBOL(csum_and_copy_from_user);
-
 
 /*
  * copy from kernel space while checksumming, otherwise like csum_partial
diff --git a/arch/powerpc/lib/checksum_wrappers.c 
b/arch/powerpc/lib/checksum_wrappers.c
index f3999cbb2fcc4..1a14c8780278c 100644
--- a/arch/powerpc/lib/checksum_wrappers.c
+++ b/arch/powerpc/lib/checksum_wrappers.c
@@ -24,7 +24,6 @@ __wsum csum_and_copy_from_user(const void __user *src, void 
*dst,
user_read_access_end();
return csum;
 }
-EXPORT_SYMBOL(csum_and_copy_from_user);
 
 __wsum csum_and_copy_to_user(const void *src, void __user *dst, int len)
 {
@@ -38,4 +37,3 @@ __wsum csum_and_copy_to_user(const void *src, void __user 
*dst, int len)
user_write_access_end();
return csum;
 }
-EXPORT_SYMBOL(csum_and_copy_to_user);
diff --git a/arch/x86/lib/csum-wrappers_64.c b/arch/x86/lib/csum-wrappers_64.c
index 189344924a2be..145f9a0bde29a 100644
--- a/arch/x86/lib/csum-wrappers_64.c
+++ b/arch/x86/lib/csum-wrappers_64.c
@@ -32,7 +32,6 @@ csum_and_copy_from_user(const void __user *src, void *dst, 
int len)
user_access_end();
return sum;
 }
-EXPORT_SYMBOL(csum_and_copy_from_user);
 
 /**
  * csum_and_copy_to_user - Copy and checksum to user space.
@@ -57,7 +56,6 @@ csum_and_copy_to_user(const void *src, void __user *dst, int 
len)
user_access_end();
return sum;
 }
-EXPORT_SYMBOL(csum_and_copy_to_user);
 
 /**
  * csum_partial_copy_nocheck - Copy and checksum.
-- 
2.30.2



Re: cleanup swiotlb initialization v8

2022-04-13 Thread Christoph Hellwig
On Mon, Apr 04, 2022 at 07:05:44AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series tries to clean up the swiotlb initialization, including
> that of swiotlb-xen.  To get there is also removes the x86 iommu table
> infrastructure that massively obsfucates the initialization path.
> 
> Git tree:
> 
> git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup
> 
> Gitweb:
> 
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup

I've updated the git tree above with the commit message nitpicks and
received reviews.  I plan to pull the patches into the dma-mapping
tree after -rc3 is released, so if any involved maintainer is not happy
with the result, please speak up now.


Re: [PATCH 10/15] swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction

2022-04-12 Thread Christoph Hellwig
On Wed, Apr 06, 2022 at 08:25:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > diff --git a/arch/powerpc/platforms/pseries/svm.c 
> > b/arch/powerpc/platforms/pseries/svm.c
> > index c5228f4969eb2..3b4045d508ec8 100644
> > --- a/arch/powerpc/platforms/pseries/svm.c
> > +++ b/arch/powerpc/platforms/pseries/svm.c
> > @@ -28,7 +28,7 @@ static int __init init_svm(void)
> >  * need to use the SWIOTLB buffer for DMA even if dma_capable() says
> >  * otherwise.
> >  */
> > -   swiotlb_force = SWIOTLB_FORCE;
> > +   ppc_swiotlb_flags |= SWIOTLB_ANY | SWIOTLB_FORCE;
> 
> This is the only place you set the ppc_swiotlb_flags.. so I wonder why
> the '|=' instead of just '=' ?

Preparing for setting others and not clobbering the value.


[PATCH 15/15] x86: remove cruft from

2022-04-03 Thread Christoph Hellwig
 gets pulled in by all drivers using the DMA API.
Remove x86 internal variables and unnecessary includes from it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 11 ---
 arch/x86/include/asm/iommu.h   |  2 ++
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 256fd8115223d..1c66708e30623 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -2,17 +2,6 @@
 #ifndef _ASM_X86_DMA_MAPPING_H
 #define _ASM_X86_DMA_MAPPING_H
 
-/*
- * IOMMU interface. See Documentation/core-api/dma-api-howto.rst and
- * Documentation/core-api/dma-api.rst for documentation.
- */
-
-#include 
-#include 
-
-extern int iommu_merge;
-extern int panic_on_overflow;
-
 extern const struct dma_map_ops *dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index dba89ed40d38d..0bef44d30a278 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -8,6 +8,8 @@
 
 extern int force_iommu, no_iommu;
 extern int iommu_detected;
+extern int iommu_merge;
+extern int panic_on_overflow;
 
 #ifdef CONFIG_SWIOTLB
 extern bool x86_swiotlb_enable;
-- 
2.30.2



[PATCH 14/15] swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl

2022-04-03 Thread Christoph Hellwig
No users left.

Signed-off-by: Christoph Hellwig 
---
 include/linux/swiotlb.h |  2 --
 kernel/dma/swiotlb.c| 77 +++--
 2 files changed, 20 insertions(+), 59 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7b50c82f84ce9..7ed35dd3de6e7 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -34,13 +34,11 @@ struct scatterlist;
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
-int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags);
 unsigned long swiotlb_size_or_default(void);
 void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs));
 int swiotlb_init_late(size_t size, gfp_t gfp_mask,
int (*remap)(void *tlb, unsigned long nslabs));
-extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d5fe8f5e08300..c54fc40ebb493 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -225,33 +225,6 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
return;
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs,
-   unsigned int flags)
-{
-   struct io_tlb_mem *mem = _tlb_default_mem;
-   size_t alloc_size;
-
-   if (swiotlb_force_disable)
-   return 0;
-
-   /* protect against double initialization */
-   if (WARN_ON_ONCE(mem->nslabs))
-   return -ENOMEM;
-
-   alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
-   mem->slots = memblock_alloc(alloc_size, PAGE_SIZE);
-   if (!mem->slots)
-   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
- __func__, alloc_size, PAGE_SIZE);
-
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
-   mem->force_bounce = flags & SWIOTLB_FORCE;
-
-   if (flags & SWIOTLB_VERBOSE)
-   swiotlb_print_info();
-   return 0;
-}
-
 /*
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
@@ -259,7 +232,9 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs,
 void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
 {
+   struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = default_nslabs;
+   size_t alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
size_t bytes;
void *tlb;
 
@@ -280,7 +255,8 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
else
tlb = memblock_alloc_low(bytes, PAGE_SIZE);
if (!tlb)
-   goto fail;
+   panic("%s: failed to allocate tlb structure\n", __func__);
+
if (remap && remap(tlb, nslabs) < 0) {
memblock_free(tlb, PAGE_ALIGN(bytes));
 
@@ -290,14 +266,17 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
  __func__, bytes);
goto retry;
}
-   if (swiotlb_init_with_tbl(tlb, default_nslabs, flags))
-   goto fail_free_mem;
-   return;
 
-fail_free_mem:
-   memblock_free(tlb, bytes);
-fail:
-   pr_warn("Cannot allocate buffer");
+   mem->slots = memblock_alloc(alloc_size, PAGE_SIZE);
+   if (!mem->slots)
+   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
+ __func__, alloc_size, PAGE_SIZE);
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), default_nslabs, false);
+   mem->force_bounce = flags & SWIOTLB_FORCE;
+
+   if (flags & SWIOTLB_VERBOSE)
+   swiotlb_print_info();
 }
 
 void __init swiotlb_init(bool addressing_limit, unsigned int flags)
@@ -313,6 +292,7 @@ void __init swiotlb_init(bool addressing_limit, unsigned 
int flags)
 int swiotlb_init_late(size_t size, gfp_t gfp_mask,
int (*remap)(void *tlb, unsigned long nslabs))
 {
+   struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
unsigned long bytes;
unsigned char *vstart = NULL;
@@ -353,33 +333,16 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
return rc;
goto retry;
}
-   rc = swiotlb_late_init_with_tbl(vstart, nslabs);
-   if (rc)
-   free_pages((unsigned long)vstart, order);
-
-   return rc;
-}
-
-int
-swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)

[PATCH 13/15] swiotlb: merge swiotlb-xen initialization into swiotlb

2022-04-03 Thread Christoph Hellwig
Reuse the generic swiotlb initialization for xen-swiotlb.  For ARM/ARM64
this works trivially, while for x86 xen_swiotlb_fixup needs to be passed
as the remap argument to swiotlb_init_remap/swiotlb_init_late.

Note that the lower bound of the swiotlb size is changed to the smaller
IO_TLB_MIN_SLABS based value with this patch, but that is fine as the
2MB value used in Xen before was just an optimization and is not the
hard lower bound.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Stefano Stabellini 
---
 arch/arm/xen/mm.c   |  21 +++---
 arch/x86/include/asm/xen/page.h |   5 --
 arch/x86/kernel/pci-dma.c   |  20 ++---
 drivers/xen/swiotlb-xen.c   | 128 +---
 include/xen/arm/page.h  |   1 -
 include/xen/swiotlb-xen.h   |   8 +-
 6 files changed, 28 insertions(+), 155 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 28c2070602535..ff05a7899cb86 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -23,22 +23,20 @@
 #include 
 #include 
 
-unsigned long xen_get_swiotlb_free_pages(unsigned int order)
+static gfp_t xen_swiotlb_gfp(void)
 {
phys_addr_t base;
-   gfp_t flags = __GFP_NOWARN|__GFP_KSWAPD_RECLAIM;
u64 i;
 
for_each_mem_range(i, , NULL) {
if (base < (phys_addr_t)0x) {
if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   flags |= __GFP_DMA32;
-   else
-   flags |= __GFP_DMA;
-   break;
+   return __GFP_DMA32;
+   return __GFP_DMA;
}
}
-   return __get_free_pages(flags, order);
+
+   return GFP_KERNEL;
 }
 
 static bool hypercall_cflush = false;
@@ -140,10 +138,13 @@ static int __init xen_mm_init(void)
if (!xen_swiotlb_detect())
return 0;
 
-   rc = xen_swiotlb_init();
/* we can work with the default swiotlb */
-   if (rc < 0 && rc != -EEXIST)
-   return rc;
+   if (!io_tlb_default_mem.nslabs) {
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  xen_swiotlb_gfp(), NULL);
+   if (rc < 0)
+   return rc;
+   }
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index e989bc2269f54..1fc67df500145 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -357,9 +357,4 @@ static inline bool xen_arch_need_swiotlb(struct device *dev,
return false;
 }
 
-static inline unsigned long xen_get_swiotlb_free_pages(unsigned int order)
-{
-   return __get_free_pages(__GFP_NOWARN, order);
-}
-
 #endif /* _ASM_X86_XEN_PAGE_H */
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index a705a199bf8a3..30bbe4abb5d61 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -72,15 +72,13 @@ static inline void __init pci_swiotlb_detect(void)
 #endif /* CONFIG_SWIOTLB */
 
 #ifdef CONFIG_SWIOTLB_XEN
-static bool xen_swiotlb;
-
 static void __init pci_xen_swiotlb_init(void)
 {
if (!xen_initial_domain() && !x86_swiotlb_enable)
return;
x86_swiotlb_enable = true;
-   xen_swiotlb = true;
-   xen_swiotlb_init_early();
+   x86_swiotlb_flags |= SWIOTLB_ANY;
+   swiotlb_init_remap(true, x86_swiotlb_flags, xen_swiotlb_fixup);
dma_ops = _swiotlb_dma_ops;
if (IS_ENABLED(CONFIG_PCI))
pci_request_acs();
@@ -88,14 +86,16 @@ static void __init pci_xen_swiotlb_init(void)
 
 int pci_xen_swiotlb_init_late(void)
 {
-   int rc;
-
-   if (xen_swiotlb)
+   if (dma_ops == _swiotlb_dma_ops)
return 0;
 
-   rc = xen_swiotlb_init();
-   if (rc)
-   return rc;
+   /* we can work with the default swiotlb */
+   if (!io_tlb_default_mem.nslabs) {
+   int rc = swiotlb_init_late(swiotlb_size_or_default(),
+  GFP_KERNEL, xen_swiotlb_fixup);
+   if (rc < 0)
+   return rc;
+   }
 
/* XXX: this switches the dma ops under live devices! */
dma_ops = _swiotlb_dma_ops;
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c2da3eb4826e8..df8085b50df10 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -104,7 +104,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
return 0;
 }
 
-static int xen_swiotlb_fixup(void *buf, unsigned long nslabs)
+int xen_swiotlb_fixup(void *buf, unsigned long nslabs)
 {
int rc;
unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
@@ -130,132 +130,6 @@ static int xen_swiotlb_fixup(void *buf, unsigned long 
nslabs)
re

[PATCH 12/15] swiotlb: provide swiotlb_init variants that remap the buffer

2022-04-03 Thread Christoph Hellwig
To shared more code between swiotlb and xen-swiotlb, offer a
swiotlb_init_remap interface and add a remap callback to
swiotlb_init_late that will allow Xen to remap the buffer the
buffer without duplicating much of the logic.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/pci/sta2x11-fixup.c |  2 +-
 include/linux/swiotlb.h  |  5 -
 kernel/dma/swiotlb.c | 36 +---
 3 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index c7e6faf59a861..7368afc039987 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
int size = STA2X11_SWIOTLB_SIZE;
/* First instance: register your own swiotlb area */
dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
-   if (swiotlb_init_late(size, GFP_DMA))
+   if (swiotlb_init_late(size, GFP_DMA, NULL))
dev_emerg(>dev, "init swiotlb failed\n");
}
list_add(>list, _instance_list);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ee655f2e4d28b..7b50c82f84ce9 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -36,8 +36,11 @@ struct scatterlist;
 
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags);
 unsigned long swiotlb_size_or_default(void);
+void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
+   int (*remap)(void *tlb, unsigned long nslabs));
+int swiotlb_init_late(size_t size, gfp_t gfp_mask,
+   int (*remap)(void *tlb, unsigned long nslabs));
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
-int swiotlb_init_late(size_t size, gfp_t gfp_mask);
 extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 119187afc65ec..d5fe8f5e08300 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -256,9 +256,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs,
  * Statically reserve bounce buffer space and initialize bounce buffer data
  * structures for the software IO TLB used to implement the DMA API.
  */
-void __init swiotlb_init(bool addressing_limit, unsigned int flags)
+void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
+   int (*remap)(void *tlb, unsigned long nslabs))
 {
-   size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
+   unsigned long nslabs = default_nslabs;
+   size_t bytes;
void *tlb;
 
if (!addressing_limit && !swiotlb_force_bounce)
@@ -271,12 +273,23 @@ void __init swiotlb_init(bool addressing_limit, unsigned 
int flags)
 * allow to pick a location everywhere for hypervisors with guest
 * memory encryption.
 */
+retry:
+   bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
if (flags & SWIOTLB_ANY)
tlb = memblock_alloc(bytes, PAGE_SIZE);
else
tlb = memblock_alloc_low(bytes, PAGE_SIZE);
if (!tlb)
goto fail;
+   if (remap && remap(tlb, nslabs) < 0) {
+   memblock_free(tlb, PAGE_ALIGN(bytes));
+
+   nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
+   if (nslabs < IO_TLB_MIN_SLABS)
+   panic("%s: Failed to remap %zu bytes\n",
+ __func__, bytes);
+   goto retry;
+   }
if (swiotlb_init_with_tbl(tlb, default_nslabs, flags))
goto fail_free_mem;
return;
@@ -287,12 +300,18 @@ void __init swiotlb_init(bool addressing_limit, unsigned 
int flags)
pr_warn("Cannot allocate buffer");
 }
 
+void __init swiotlb_init(bool addressing_limit, unsigned int flags)
+{
+   return swiotlb_init_remap(addressing_limit, flags, NULL);
+}
+
 /*
  * Systems with larger DMA zones (those that don't support ISA) can
  * initialize the swiotlb later using the slab allocator if needed.
  * This should be just like above, but with some error catching.
  */
-int swiotlb_init_late(size_t size, gfp_t gfp_mask)
+int swiotlb_init_late(size_t size, gfp_t gfp_mask,
+   int (*remap)(void *tlb, unsigned long nslabs))
 {
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
unsigned long bytes;
@@ -303,6 +322,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask)
if (swiotlb_force_disable)
return 0;
 
+retry:
order = get_order(nslabs << IO_TLB_SHIFT);
nslabs = SLABS_PER_PAGE << order;
bytes = nslabs << IO_TLB_SHIFT;
@@ -323,6 +343,16 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask)
 

[PATCH 11/15] swiotlb: pass a gfp_mask argument to swiotlb_init_late

2022-04-03 Thread Christoph Hellwig
Let the caller chose a zone to allocate from.  This will be used
later on by the xen-swiotlb initialization on arm.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Anshuman Khandual 
---
 arch/x86/pci/sta2x11-fixup.c | 2 +-
 include/linux/swiotlb.h  | 2 +-
 kernel/dma/swiotlb.c | 7 ++-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index e0c039a75b2db..c7e6faf59a861 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
int size = STA2X11_SWIOTLB_SIZE;
/* First instance: register your own swiotlb area */
dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
-   if (swiotlb_init_late(size))
+   if (swiotlb_init_late(size, GFP_DMA))
dev_emerg(>dev, "init swiotlb failed\n");
}
list_add(>list, _instance_list);
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index eabdd89987027..ee655f2e4d28b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -37,7 +37,7 @@ struct scatterlist;
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags);
 unsigned long swiotlb_size_or_default(void);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
-int swiotlb_init_late(size_t size);
+int swiotlb_init_late(size_t size, gfp_t gfp_mask);
 extern void __init swiotlb_update_mem_attributes(void);
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f6e091424af3d..119187afc65ec 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -292,7 +292,7 @@ void __init swiotlb_init(bool addressing_limit, unsigned 
int flags)
  * initialize the swiotlb later using the slab allocator if needed.
  * This should be just like above, but with some error catching.
  */
-int swiotlb_init_late(size_t size)
+int swiotlb_init_late(size_t size, gfp_t gfp_mask)
 {
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
unsigned long bytes;
@@ -303,15 +303,12 @@ int swiotlb_init_late(size_t size)
if (swiotlb_force_disable)
return 0;
 
-   /*
-* Get IO TLB memory from the low pages
-*/
order = get_order(nslabs << IO_TLB_SHIFT);
nslabs = SLABS_PER_PAGE << order;
bytes = nslabs << IO_TLB_SHIFT;
 
while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
-   vstart = (void *)__get_free_pages(GFP_DMA | __GFP_NOWARN,
+   vstart = (void *)__get_free_pages(gfp_mask | __GFP_NOWARN,
  order);
if (vstart)
break;
-- 
2.30.2



  1   2   3   4   5   6   7   8   9   10   >