Hello Trampas,

thanks for the hints. I initialized the sys ticks with 2^32 - 120 seconds,
and I got mqtt pbuf=NULL in around 120 seconds + 120 keep alive seconds.

The ChibiOs sys_arch.c port includes sys_now() (current time in
milliseconds) following simplified implementation:
  return ((u32_t)chVTGetSystemTimeX() - 1) / 10 + 1;
Since it ticks at 100 uS.

I guess it might cause the problems as it overflows back to 0 leaving the
lwip timers waiting for value higher than (2^32)/10.

To support my guess, I turned on another debug option and last lwip timer
message I see is:
sys_timeout: 2000C5DC abs_time=429497730 handler=ip_reass_tmr arg=805B28C


Adam

pá 28. 5. 2021 v 13:45 odesílatel Trampas Stern <[email protected]> napsal:

> Increase the counter to a uint64_t.
>
> You can also start the counter at something other than zero to prove root
> cause faster.
>
> Trampas
>
> On Fri, May 28, 2021 at 7:08 AM Adam Baron <[email protected]> wrote:
>
>> Czesc Tomek :),
>>
>> I'll try to add it. Thanks.
>>
>> However, I feel like it is rather related to the problem of overflowing a
>> uint32 counter of some kind. Since the TCP_PCBs are not freed after 2^32
>> ticks.
>>
>> Adam
>>
>> pá 28. 5. 2021 v 9:44 odesílatel Tomasz W <[email protected]> napsal:
>>
>>> Hi (Cześć)
>>> Lok for this
>>> https://lists.nongnu.org/archive/html/lwip-devel/2020-12/msg00014.html
>>> In my case it solved the problem of the web server dying after a few days
>>>
>>>
>>> pt., 28 maj 2021 o 08:58 Adam Baron <[email protected]> napisał(a):
>>> >
>>> > Hello all,
>>> >
>>> > I'm having a small STM32F4 application running on devel branch of
>>> lwip, It includes httpd, sntp, smtp client, and mqtt client. All is running
>>> well until the fifth day, when mqtt client starts to receive pbuf=NULL and
>>> disconnects. My reconnect routine reconnects it in some short time, but it
>>> receives pbuf=NULL shortly after.
>>> >
>>> > Also later on I noticed in log: memp_malloc: out of memory in pool
>>> TCP_PCB.
>>> > I'm having defined MEMP_NUM_TCP_PCB as 30 and it seems enough for
>>> normal operation, I also upped it to 50, but ended with the same problem
>>> > In statistics the NUM_TCP_PCB increases and decreases as it should,
>>> but after uptime past 5 days it stays high with an error flag triggered.
>>> >
>>> > Quite interestingly it happens exactly after 2^32 milliseconds uptime.
>>> I tried to keep OpenOCD connected to start to peek in, but yet I did not
>>> manage to keep the openOCD running for so long without dropping the
>>> connection.
>>> >
>>> > Does anyone have any ideas please?
>>> >
>>> > Thanks in advance,
>>> > --
>>> > 731435556
>>> > Adam Baron
>>> > _______________________________________________
>>> > lwip-users mailing list
>>> > [email protected]
>>> > https://lists.nongnu.org/mailman/listinfo/lwip-users
>>>
>>>
>>>
>>> --
>>> Pozdrawiam
>>> Tomek
>>>
>>> _______________________________________________
>>> lwip-users mailing list
>>> [email protected]
>>> https://lists.nongnu.org/mailman/listinfo/lwip-users
>>
>>
>>
>> --
>> 731435556
>> Adam Baron
>> _______________________________________________
>> lwip-users mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/lwip-users
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users



-- 
731435556
Adam Baron
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to