[+cc Krzysztof, Pali, Oliver]

On Thu, Feb 04, 2021 at 05:58:31PM +0100, Daniel Vetter wrote:
> Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
> the region") /dev/kmem zaps ptes when the kernel requests exclusive
> acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
> the default for all driver uses.
> 
> Except there's two more ways to access PCI BARs: sysfs and proc mmap
> support. Let's plug that hole.

IIUC, the idea is that if a driver calls request_mem_region() on a PCI
BAR, we prevent access to the BAR via sysfs.  I guess I'm OK with that
if it's a real security improvement or something.

But the downside of this implementation is that it depends on
iomem_get_mapping(), which doesn't work until after fs_initcalls,
which means the sysfs files cannot be static attributes of devices
added before that.  PCI devices are typically enumerated in
subsys_initcall.

Krzysztof is converting PCI sysfs files (config, rom, reset, vpd, etc)
to static attributes.  This is a major improvement that could get rid
of pci_create_sysfs_dev_files(), the late_initcall pci_sysfs_init(),
and the "sysfs_initialized" hack.  This would fix a race reported by
Pali [1] (thanks to Oliver for the idea [2]).

EXCEPT that this revoke change means the "resource%d", "legacy_io",
and "legacy_mem" files cannot be static attributes because of
iomem_get_mapping().

Any ideas on how to deal with this?  Having to keep the
pci_sysfs_init() initcall just for these few files seems like the tail
wagging the dog.

[1] https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali
[2] 
https://lore.kernel.org/r/caosf1chss03dbsdo4pmttmp0tceu5kscn704zewlkgxqzbf...@mail.gmail.com

> For revoke_devmem() to work we need to link our vma into the same
> address_space, with consistent vma->vm_pgoff. ->pgoff is already
> adjusted, because that's how (io_)remap_pfn_range works, but for the
> mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
> to adjust this at at ->open time:
> 
> - for sysfs this is easy, now that binary attributes support this. We
>   just set bin_attr->mapping when mmap is supported
> - for procfs it's a bit more tricky, since procfs pci access has only
>   one file per device, and access to a specific resources first needs
>   to be set up with some ioctl calls. But mmap is only supported for
>   the same resources as sysfs exposes with mmap support, and otherwise
>   rejected, so we can set the mapping unconditionally at open time
>   without harm.
> 
> A special consideration is for arch_can_pci_mmap_io() - we need to
> make sure that the ->f_mapping doesn't alias between ioport and iomem
> space. There's only 2 ways in-tree to support mmap of ioports: generic
> pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
> architecture hand-rolling. Both approach support ioport mmap through a
> special pfn range and not through magic pte attributes. Aliasing is
> therefore not a problem.
> 
> The only difference in access checks left is that sysfs PCI mmap does
> not check for CAP_RAWIO. I'm not really sure whether that should be
> added or not.
> 
> Acked-by: Bjorn Helgaas <bhelg...@google.com>
> Reviewed-by: Dan Williams <dan.j.willi...@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vet...@intel.com>
> Cc: Stephen Rothwell <s...@canb.auug.org.au>
> Cc: Jason Gunthorpe <j...@ziepe.ca>
> Cc: Kees Cook <keesc...@chromium.org>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: John Hubbard <jhubb...@nvidia.com>
> Cc: Jérôme Glisse <jgli...@redhat.com>
> Cc: Jan Kara <j...@suse.cz>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> Cc: linux...@kvack.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-samsung-...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Cc: Bjorn Helgaas <bhelg...@google.com>
> Cc: linux-...@vger.kernel.org
> ---
>  drivers/pci/pci-sysfs.c | 4 ++++
>  drivers/pci/proc.c      | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 0c45b4f7b214..f8afd54ca3e1 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -942,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b)
>       b->legacy_io->read = pci_read_legacy_io;
>       b->legacy_io->write = pci_write_legacy_io;
>       b->legacy_io->mmap = pci_mmap_legacy_io;
> +     b->legacy_io->mapping = iomem_get_mapping();
>       pci_adjust_legacy_attr(b, pci_mmap_io);
>       error = device_create_bin_file(&b->dev, b->legacy_io);
>       if (error)
> @@ -954,6 +955,7 @@ void pci_create_legacy_files(struct pci_bus *b)
>       b->legacy_mem->size = 1024*1024;
>       b->legacy_mem->attr.mode = 0600;
>       b->legacy_mem->mmap = pci_mmap_legacy_mem;
> +     b->legacy_io->mapping = iomem_get_mapping();
>       pci_adjust_legacy_attr(b, pci_mmap_mem);
>       error = device_create_bin_file(&b->dev, b->legacy_mem);
>       if (error)
> @@ -1169,6 +1171,8 @@ static int pci_create_attr(struct pci_dev *pdev, int 
> num, int write_combine)
>                       res_attr->mmap = pci_mmap_resource_uc;
>               }
>       }
> +     if (res_attr->mmap)
> +             res_attr->mapping = iomem_get_mapping();
>       res_attr->attr.name = res_attr_name;
>       res_attr->attr.mode = 0600;
>       res_attr->size = pci_resource_len(pdev, num);
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index 3a2f90beb4cb..9bab07302bbf 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct 
> file *file)
>       fpriv->write_combine = 0;
>  
>       file->private_data = fpriv;
> +     file->f_mapping = iomem_get_mapping();
>  
>       return 0;
>  }
> -- 
> 2.30.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Reply via email to