On 20-Mar-18 11:35 AM, Shreyansh Jain wrote:
Hello Anatoly,

On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov
<anatoly.bura...@intel.com> wrote:
If a user has specified that the zone should have contiguous memory,
use the new _contig allocation API's instead of normal ones.
Otherwise, account for the fact that unless we're in IOVA_AS_VA
mode, we cannot guarantee that the pages would be physically
contiguous, so we calculate the memzone size and alignments as if
we were getting the smallest page size available.

Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
---

[...]

  static void
  mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
  {
@@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
         unsigned mz_id, n;
         unsigned int mp_flags;
         int ret;
+       bool force_contig, no_contig;

         /* mempool must not be populated */
         if (mp->nb_mem_chunks != 0)
@@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp)
         /* update mempool capabilities */
         mp->flags |= mp_flags;

-       if (rte_eal_has_hugepages()) {
-               pg_shift = 0; /* not needed, zone is physically contiguous */
+       no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG;
+       force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG;
+
+       /*
+        * there are several considerations for page size and page shift here.
+        *
+        * if we don't need our mempools to have physically contiguous objects,
+        * then just set page shift and page size to 0, because the user has
+        * indicated that there's no need to care about anything.

I think the above case is not handled properly here.
reason below...

+        *
+        * if we do need contiguous objects, there is also an option to reserve
+        * the entire mempool memory as one contiguous block of memory, in
+        * which case the page shift and alignment wouldn't matter as well.
+        *
+        * if we require contiguous objects, but not necessarily the entire
+        * mempool reserved space to be contiguous, then there are two options.
+        *
+        * if our IO addresses are virtual, not actual physical (IOVA as VA
+        * case), then no page shift needed - our memory allocation will give us
+        * contiguous physical memory as far as the hardware is concerned, so
+        * act as if we're getting contiguous memory.
+        *
+        * if our IO addresses are physical, we may get memory from bigger
+        * pages, or we might get memory from smaller pages, and how much of it
+        * we require depends on whether we want bigger or smaller pages.
+        * However, requesting each and every memory size is too much work, so
+        * what we'll do instead is walk through the page sizes available, pick
+        * the smallest one and set up page shift to match that one. We will be
+        * wasting some space this way, but it's much nicer than looping around
+        * trying to reserve each and every page size.
+        */
+
+       if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) {
                 pg_sz = 0;
+               pg_shift = 0;
                 align = RTE_CACHE_LINE_SIZE;

So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode.
pg_sz = 0 is set.
same as before applying the hotplug patchset except that earlier this
decision was purely based on availability of hugepages
(rte_eal_has_hugepages()).
Moving on...

+       } else if (rte_eal_has_hugepages()) {
+               pg_sz = get_min_page_size();
+               pg_shift = rte_bsf32(pg_sz);
+               align = pg_sz;
         } else {
                 pg_sz = getpagesize();
                 pg_shift = rte_bsf32(pg_sz);
@@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp)
                         goto fail;
                 }

-               mz = rte_memzone_reserve_aligned(mz_name, size,
-                       mp->socket_id, mz_flags, align);
-               /* not enough memory, retry with the biggest zone we have */
-               if (mz == NULL)
-                       mz = rte_memzone_reserve_aligned(mz_name, 0,
+               if (force_contig) {
+                       /*
+                        * if contiguous memory for entire mempool memory was
+                        * requested, don't try reserving again if we fail.
+                        */
+                       mz = rte_memzone_reserve_aligned_contig(mz_name, size,
+                               mp->socket_id, mz_flags, align);
+               } else {
+                       mz = rte_memzone_reserve_aligned(mz_name, size,
                                 mp->socket_id, mz_flags, align);
+                       /* not enough memory, retry with the biggest zone we
+                        * have
+                        */
+                       if (mz == NULL)
+                               mz = rte_memzone_reserve_aligned(mz_name, 0,
+                                       mp->socket_id, mz_flags, align);
+               }
                 if (mz == NULL) {
                         ret = -rte_errno;
                         goto fail;
                 }

-               if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+               if (no_contig)
                         iova = RTE_BAD_IOVA;
                 else
                         iova = mz->iova;

-               if (rte_eal_has_hugepages())
+               if (rte_eal_has_hugepages() && force_contig)

So, pre-hotplugging patch, call used to enter mempool_populate_iova.
But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521)
while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is
called instead.

                         ret = rte_mempool_populate_iova(mp, mz->addr,
                                 iova, mz->len,
                                 rte_mempool_memchunk_mz_free,
--
2.7.4

This is called with pg_sz = 0:
678                 else
# 679                   ret = rte_mempool_populate_virt(mp, mz->addr,
680                                 mz->len, pg_sz,
681                                 rte_mempool_memchunk_mz_free,
682                                 (void *)(uintptr_t)mz);

In this function,

512         /* address and len must be page-aligned */
513         if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr)
514                 return -EINVAL;

This is where error is returned.

I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0.

It is roughly equivalent to:
RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL
(0 ~ pg_sz).

Basically, this ends up failing rte_mempool_populate_default.

I think the reason is the assumption that when
rte_mempool_populate_virt is called, it can handle 0 page sizes (there
would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a
for-loop looping over off+pg_sz), is wrong. It needs a valid page-size
value to work with (!0).

So, basically, DPAA2 is stuck with this patch because of above issue,
if I am correctly comprehending it as above.

Regards,
Shreyansh


Thanks for testing this. I'll look into fixing it.

--
Thanks,
Anatoly

Reply via email to