On 4/12/21 10:38 AM, Eric Dumazet wrote: [ ... ] > Yes, I think this is the real issue here. This smells like some memory > corruption. > > In my traces, packet is correctly received in AF_PACKET queue. > > I have checked the skb is well formed. > > But the user space seems to never call poll() and recvmsg() on this > af_packet socket. >
After sprinkling the kernel with debug messages: 424 00:01:33.674181 sendto(6, "E\0\1H\0\0\0\0@\21y\246\0\0\0\0\377\377\377\377\0D\0C\00148\346\1\1\6\0\246\336\333\v\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0RT\0\ 424 00:01:33.693873 close(6) = 0 424 00:01:33.694652 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 424 00:01:33.695213 clock_gettime64(CLOCK_MONOTONIC, 0x7be18a18) = -1 EFAULT (Bad address) 424 00:01:33.695889 write(2, "udhcpc: clock_gettime(MONOTONIC) failed\n", 40) = -1 EFAULT (Bad address) 424 00:01:33.697311 exit_group(1) = ? 424 00:01:33.698346 +++ exited with 1 +++ I only see that after adding debug messages in the kernel, so I guess there must be a heisenbug somehere. Anyway, indeed, I see (another kernel debug message): __do_sys_clock_gettime: Returning -EFAULT on address 0x7bacc9a8 So udhcpc doesn't even try to read the reply because it crashes after sendto() when trying to read the current time. Unless I am missing something, that means that the problem happens somewhere on the send side. To make things even more interesting, it looks like the failing system call isn't always clock_gettime(). Guenter