On 25-Mar-21 8:21 AM, xiangxia.m....@gmail.com wrote:
From: Tonghao Zhang <xiangxia.m....@gmail.com>

The hugepage of different size, 2MB, 1GB may be mounted on
the same directory (e.g /dev/hugepages). Then dpdk
primary process will be blocked. To address this issue,
add the LOCK_NB flags to flock().

$ cat /proc/mounts
...
none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=1024M 0 0
none /dev/hugepages hugetlbfs rw,seclabel,relatime,pagesize=2M 0 0

Add more details for err logs.

Signed-off-by: Tonghao Zhang <xiangxia.m....@gmail.com>
---
  lib/librte_eal/linux/eal_hugepage_info.c | 7 +++++--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linux/eal_hugepage_info.c 
b/lib/librte_eal/linux/eal_hugepage_info.c
index d97792cadeb6..1ff76e539053 100644
--- a/lib/librte_eal/linux/eal_hugepage_info.c
+++ b/lib/librte_eal/linux/eal_hugepage_info.c
@@ -451,9 +451,12 @@ hugepage_info_init(void)
                hpi->lock_descriptor = open(hpi->hugedir, O_RDONLY);
/* if blocking lock failed */
-               if (flock(hpi->lock_descriptor, LOCK_EX) == -1) {
+               if (flock(hpi->lock_descriptor, LOCK_EX | LOCK_NB) == -1) {
                        RTE_LOG(CRIT, EAL,
-                               "Failed to lock hugepage directory!\n");
+                               "Failed to lock hugepage directory! "
+                               "The hugepage dir (%s) was locked by "
+                               "other processes or self twice.\n",
+                               hpi->hugedir);
                        break;
                }
                /* clear out the hugepages dir from unused pages */


Use cases such as "having two hugetlbfs page sizes on the same hugetlbfs mountpoint" are user error, but i agree that deadlocking is probably not the way we want to go about it.

An alternative way would be to check if we already have a mountpoint with the same path, and this would produce a better error message (as a user, "hugepage dir is locked by self twice" doesn't tell me anything useful), at a cost of slightly more complicated code.

I'm not sure which way i want to go here. Normally, hugetlbfs shouldn't be staying locked for long, so i'm wary of adding a LOCK_NB here, so i feel slightly uneasy about this patch. Do you have any opinions?

Also, do other OS's EALs need similar fix?

--
Thanks,
Anatoly

Reply via email to