On 08/03/2016 01:42, Jianfeng Tan wrote:
> This patch adds an option, --huge-trybest, to use a recover mechanism to
> the case that there are not so many hugepages (declared in sysfs), which
> can be used. It relys on a mem access to fault-in hugepages, and if fails
> with SIGBUS, recover to previously saved stack environment with
> siglongjmp().
>
> Test example:
>    a. cgcreate -g hugetlb:/test-subgroup
>    b. cgset -r hugetlb.1GB.limit_in_bytes=2147483648 test-subgroup
>    c. cgexec -g hugetlb:test-subgroup \
>         ./examples/helloworld/build/helloworld -c 0x2 -n 4 --huge-trybest

I think you should mention in the commit message that this option also 
covers the case
of hugetlbfs mount with quota.

>
> +static sigjmp_buf jmpenv;
> +
> +static void sigbus_handler(int signo __rte_unused)
> +{
> +     siglongjmp(jmpenv, 1);
> +}
> +
> +/* Put setjmp into a wrap method to avoid compiling error. Any non-volatile,
> + * non-static local variable in the stack frame calling setjmp might be
> + * clobbered by a call to longjmp.
> + */
> +static int wrap_setjmp(void)
> +{
> +     return setjmp(jmpenv);
> +}

Use sigsetjmp instead of setjmp and restore the signal masks.

>   /*
>    * Mmap all hugepages of hugepage table: it first open a file in
>    * hugetlbfs, then mmap() hugepage_sz data in it. If orig is set, the
> @@ -396,7 +413,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>               if (fd < 0) {
>                       RTE_LOG(ERR, EAL, "%s(): open failed: %s\n", __func__,
>                                       strerror(errno));
> -                     return -1;
> +                     return i;

When using --try-best, we could get an error and still work as expected.
It can be confusing for users to see an error when it is expected behavior.

Any thoughts?

>               }
>
>               /* map the segment, and populate page tables,
> @@ -407,7 +424,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>                       RTE_LOG(ERR, EAL, "%s(): mmap failed: %s\n", __func__,
>                                       strerror(errno));
>                       close(fd);
> -                     return -1;
> +                     return i;
>               }
>

Same comment as above

>               /* set shared flock on the file. */
>               if (flock(fd, LOCK_SH | LOCK_NB) == -1) {
>                       RTE_LOG(ERR, EAL, "%s(): Locking file failed:%s \n",
>                               __func__, strerror(errno));
>                       close(fd);
> -                     return -1;
> +                     return i;

Same comment as above

> @@ -1137,10 +1206,24 @@ rte_eal_hugepage_init(void)
>                       continue;
>
>               /* map all hugepages available */
> -             if (map_all_hugepages(&tmp_hp[hp_offset], hpi, 1) < 0){
> -                     RTE_LOG(DEBUG, EAL, "Failed to mmap %u MB hugepages\n",
> -                                     (unsigned)(hpi->hugepage_sz / 
> 0x100000));
> -                     goto fail;
> +             pages_old = hpi->num_pages[0];
> +             pages_new = map_all_hugepages(&tmp_hp[hp_offset], hpi, 1);
> +             if (pages_new < pages_old) {
> +                     RTE_LOG(DEBUG, EAL,
> +                             "%d not %d hugepages of size %u MB allocated\n",
> +                             pages_new, pages_old,
> +                             (unsigned)(hpi->hugepage_sz / 0x100000));
> +                     if (internal_config.huge_trybest) {
> +                             int pages = pages_old - pages_new;
> +
> +                             internal_config.memory -=
> +                                     hpi->hugepage_sz * pages;
> +                             nr_hugepages -= pages;
> +                             hpi->num_pages[0] = pages_new;
> +                             if (pages_new == 0)
> +                                     continue;
> +                     } else
> +                             goto fail;
>               }

There is another call to map_all_hugepages that you are not updating the 
check of the return value.

Sergio

Reply via email to