On Thu, 18 Apr 2024 22:19:33 +0200, Jesper Dangaard Brouer <h...@kernel.org> 
wrote:
>
>
> On 17/04/2024 10.20, Xuan Zhuo wrote:
> > On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasow...@redhat.com> wrote:
> >> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanz...@linux.alibaba.com> 
> >> wrote:
> >>>
> >>> On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasow...@redhat.com> 
> >>> wrote:
> >>>> On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanz...@linux.alibaba.com> 
> >>>> wrote:
> >>>>>
> >>>>> On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasow...@redhat.com> 
> >>>>> wrote:
> >>>>>> On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanz...@linux.alibaba.com> 
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasow...@redhat.com> 
> >>>>>>> wrote:
> >>>>>>>> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo 
> >>>>>>>> <xuanz...@linux.alibaba.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang 
> >>>>>>>>> <jasow...@redhat.com> wrote:
> >>>>>>>>>> On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo 
> >>>>>>>>>> <xuanz...@linux.alibaba.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang 
> >>>>>>>>>>> <jasow...@redhat.com> wrote:
> >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo 
> >>>>>>>>>>>> <xuanz...@linux.alibaba.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, we chain the pages of big mode by the page's private 
> >>>>>>>>>>>>> variable.
> >>>>>>>>>>>>> But a subsequent patch aims to make the big mode to support
> >>>>>>>>>>>>> premapped mode. This requires additional space to store the dma 
> >>>>>>>>>>>>> addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Within the sub-struct that contains the 'private', there is no 
> >>>>>>>>>>>>> suitable
> >>>>>>>>>>>>> variable for storing the DMA addr.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* Page cache and anonymous 
> >>>>>>>>>>>>> pages */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @lru: Pageout list, eg. active_list 
> >>>>>>>>>>>>> protected by
> >>>>>>>>>>>>>                           * lruvec->lru_lock.  Sometimes used 
> >>>>>>>>>>>>> as a generic list
> >>>>>>>>>>>>>                           * by the page owner.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  struct list_head lru;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, for the Unevictable 
> >>>>>>>>>>>>> "LRU list" slot */
> >>>>>>>>>>>>>                                  struct {
> >>>>>>>>>>>>>                                          /* Always even, to 
> >>>>>>>>>>>>> negate PageTail */
> >>>>>>>>>>>>>                                          void *__filler;
> >>>>>>>>>>>>>                                          /* Count page's or 
> >>>>>>>>>>>>> folio's mlocks */
> >>>>>>>>>>>>>                                          unsigned int 
> >>>>>>>>>>>>> mlock_count;
> >>>>>>>>>>>>>                                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                                  /* Or, free page */
> >>>>>>>>>>>>>                                  struct list_head buddy_list;
> >>>>>>>>>>>>>                                  struct list_head pcp_list;
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /* See page-flags.h for 
> >>>>>>>>>>>>> PAGE_MAPPING_FLAGS */
> >>>>>>>>>>>>>                          struct address_space *mapping;
> >>>>>>>>>>>>>                          union {
> >>>>>>>>>>>>>                                  pgoff_t index;          /* Our 
> >>>>>>>>>>>>> offset within mapping. */
> >>>>>>>>>>>>>                                  unsigned long share;    /* 
> >>>>>>>>>>>>> share count for fsdax */
> >>>>>>>>>>>>>                          };
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @private: Mapping-private opaque 
> >>>>>>>>>>>>> data.
> >>>>>>>>>>>>>                           * Usually used for buffer_heads if 
> >>>>>>>>>>>>> PagePrivate.
> >>>>>>>>>>>>>                           * Used for swp_entry_t if 
> >>>>>>>>>>>>> PageSwapCache.
> >>>>>>>>>>>>>                           * Indicates order in the buddy system 
> >>>>>>>>>>>>> if PageBuddy.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long private;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> But within the page pool struct, we have a variable called
> >>>>>>>>>>>>> dma_addr that is appropriate for storing dma addr.
> >>>>>>>>>>>>> And that struct is used by netstack. That works to our 
> >>>>>>>>>>>>> advantage.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                  struct {        /* page_pool used by netstack 
> >>>>>>>>>>>>> */
> >>>>>>>>>>>>>                          /**
> >>>>>>>>>>>>>                           * @pp_magic: magic value to avoid 
> >>>>>>>>>>>>> recycling non
> >>>>>>>>>>>>>                           * page_pool allocated pages.
> >>>>>>>>>>>>>                           */
> >>>>>>>>>>>>>                          unsigned long pp_magic;
> >>>>>>>>>>>>>                          struct page_pool *pp;
> >>>>>>>>>>>>>                          unsigned long _pp_mapping_pad;
> >>>>>>>>>>>>>                          unsigned long dma_addr;
> >>>>>>>>>>>>>                          atomic_long_t pp_ref_count;
> >>>>>>>>>>>>>                  };
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other side, we should use variables from the same 
> >>>>>>>>>>>>> sub-struct.
> >>>>>>>>>>>>> So this patch replaces the "private" with "pp".
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Signed-off-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
> >>>>>>>>>>>>> ---
> >>>>>>>>>>>>
> >>>>>>>>>>>> Instead of doing a customized version of page pool, can we simply
> >>>>>>>>>>>> switch to use page pool for big mode instead? Then we don't need 
> >>>>>>>>>>>> to
> >>>>>>>>>>>> bother the dma stuffs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> The page pool needs to do the dma by the DMA APIs.
> >>>>>>>>>>> So we can not use the page pool directly.
> >>>>>>>>>>
> >>>>>>>>>> I found this:
> >>>>>>>>>>
> >>>>>>>>>> define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the 
> >>>>>>>>>> DMA
> >>>>>>>>>>                                          * map/unmap
> >>>>>>>>>>
> >>>>>>>>>> It seems to work here?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I have studied the page pool mechanism and believe that we cannot 
> >>>>>>>>> use it
> >>>>>>>>> directly. We can make the page pool to bypass the DMA operations.
> >>>>>>>>> This allows us to handle DMA within virtio-net for pages allocated 
> >>>>>>>>> from the page
> >>>>>>>>> pool. Furthermore, we can utilize page pool helpers to associate 
> >>>>>>>>> the DMA address
> >>>>>>>>> to the page.
> >>>>>>>>>
> >>>>>>>>> However, the critical issue pertains to unmapping. Ideally, we want 
> >>>>>>>>> to return
> >>>>>>>>> the mapped pages to the page pool and reuse them. In doing so, we 
> >>>>>>>>> can omit the
> >>>>>>>>> unmapping and remapping steps.
> >>>>>>>>>
> >>>>>>>>> Currently, there's a caveat: when the page pool cache is full, it 
> >>>>>>>>> disconnects
> >>>>>>>>> and releases the pages. When the pool hits its capacity, pages are 
> >>>>>>>>> relinquished
> >>>>>>>>> without a chance for unmapping.
>
> Could Jakub's memory provider for PP help your use-case?
>
> See: [1]
> https://lore.kernel.org/all/20240403002053.2376017-3-almasrym...@google.com/
> Or: [2]
> https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168...@redhat.com/T/


It can not. That make the pp can get page by the callbacks.

Here we talk about the map/unmap.

The virtio-net has the different DMA APIs.

        dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void 
*ptr, size_t size,
                                                  enum dma_data_direction dir, 
unsigned long attrs);
        void virtqueue_dma_unmap_single_attrs(struct virtqueue *_vq, dma_addr_t 
addr,
                                              size_t size, enum 
dma_data_direction dir,
                                              unsigned long attrs);
        dma_addr_t virtqueue_dma_map_page_attrs(struct virtqueue *_vq, struct 
page *page,
                                                size_t offset, size_t size,
                                                enum dma_data_direction dir,
                                                unsigned long attrs);
        void virtqueue_dma_unmap_page_attrs(struct virtqueue *_vq, dma_addr_t 
addr,
                                            size_t size, enum 
dma_data_direction dir,
                                            unsigned long attrs);
        int virtqueue_dma_mapping_error(struct virtqueue *_vq, dma_addr_t addr);

        bool virtqueue_dma_need_sync(struct virtqueue *_vq, dma_addr_t addr);
        void virtqueue_dma_sync_single_range_for_cpu(struct virtqueue *_vq, 
dma_addr_t addr,
                                                     unsigned long offset, 
size_t size,
                                                     enum dma_data_direction 
dir);
        void virtqueue_dma_sync_single_range_for_device(struct virtqueue *_vq, 
dma_addr_t addr,
                                                        unsigned long offset, 
size_t size,
                                                        enum dma_data_direction 
dir);


Thanks.

>
>
> [...]
> >>>>>>
> >>>>>> Adding Jesper for some comments.
> >>>>>>
> >>>>>>>
> >>>>>>> Back to this patch set, I think we should keep the virtio-net to 
> >>>>>>> manage
> >>>>>>> the pages.
> >>>>>>>
>
> For context the patch:
>   [3]
> https://lore.kernel.org/all/20240411025127.51945-4-xuanz...@linux.alibaba.com/
>
> >>>>>>> What do you think?
> >>>>>>
> >>>>>> I might be wrong, but I think if we need to either
> >>>>>>
> >>>>>> 1) seek a way to manage the pages by yourself but not touching page
> >>>>>> pool metadata (or Jesper is fine with this)
> >>>>>
> >>>>> Do you mean working with page pool or not?
> >>>>>
> >>>>
> >>>> I meant if Jesper is fine with reusing page pool metadata like this 
> >>>> patch.
> >>>>
> >>>>> If we manage the pages by self(no page pool), we do not care the 
> >>>>> metadata is for
> >>>>> page pool or not. We just use the space of pages like the "private".
> >>>>
> >>>> That's also fine.
> >>>>
>
> I'm not sure it is "fine" to, explicitly choosing not to use page pool,
> and then (ab)use `struct page` member (pp) that intended for page_pool
> for other stuff. (In this case create a linked list of pages).
>
>   +#define page_chain_next(p) ((struct page *)((p)->pp))
>   +#define page_chain_add(p, n)       ((p)->pp = (void *)n)
>
> I'm not sure that I (as PP maintainer) can make this call actually, as I
> think this area belong with the MM "page" maintainers (Cc MM-list +
> people) to judge.
>
> Just invention new ways to use struct page fields without adding your
> use-case to struct page, will make it harder for MM people to maintain
> (e.g. make future change).
>
> --Jesper
>
>

Reply via email to