Re: Port of project from NuttX 7.30 to 10.1 RC1: Unexpected IRQ

Alan Carvalho de Assis Thu, 27 May 2021 07:51:43 -0700

I think a benefit from renaming many of those "up_something" to
"stm32_something", "esp32_something", etc is now it is easy for
software find the right function.


I think many IDEs cannot handle functions search correctly for NuttX
because they don't have heuristics to know that IF I'm searching a
function inside a board or inside an arch, it shouldn't return a
function with same name from other board or from other arch.

So, at end-of-day, these modifications you are complain about, will
make the life of all users better.

BR,

Alan

On 5/27/21, Sebastien Lorquet <sebast...@lorquet.fr> wrote:
> I sill wonder what is the purpose of this variable rename. Sorry to say,
> but it just looks cosmetic while critically breaking everything that was
> made before, and this kind of thing is a nightmare for migration when
> you cant follow the project day to day. Boards can be external to the
> project, and are a supported feature, so they should continue to work
> reliably even if you change the internal sauce!
>
> At one point there was too many trafic on the mailing list and I just
> stopped reading it, I marked several hundreds of messages as read
> without having the time to go through then. It seems that this change
> was made during this time.
>
> Sebastien
>
> Le 27/05/2021 à 09:38, Sebastien Lorquet a écrit :
>> Boom, that was the extrastuff. The board now boots. We're going to run
>> a lot of functional tests to make sure everything is okay, but I dont
>> have this strange hardfault at boot.
>>
>> Thank you.
>>
>> I did not find this page despite searching through a lot of
>> documentation, mainly the "official" ReadTheDocs-like documentation.
>>
>> I suggest you link to this doc in the getting started manuals.
>>
>> Sebastien
>>
>>
>> Le 26/05/2021 à 18:42, Abdelatif Guettouche a écrit :
>>> Maybe this one could help:
>>> https://cwiki.apache.org/confluence/display/NUTTX/NuttX+9.1#NuttX9.1-CompatibilityConcerns
>>>
>>>
>>>
>>>> I am using the flat (monolithic build) and I see no place that define
>>>> this flag, at all.
>>>> I dont even see a place in the codebase that defines this flag.
>>> __KERNEL__ is defined in tools/Config.mk (line:100)
>>>
>>>> The fact that mm_initialize only shows one region is weird... where is
>>> the heap for the main RAM at 0x20000000?
>>>
>>> CONFIG_MM_REGIONS needs to be set up correctly if you have multiple
>>> heap regions.
>>>
>>> On Wed, May 26, 2021 at 5:22 PM Sebastien Lorquet
>>> <sebast...@lorquet.fr> wrote:
>>>> Hello,
>>>>
>>>> Thanks for the remarks.
>>>>
>>>> I am using the flat (monolithic build) and I see no place that define
>>>> this flag, at all.
>>>>
>>>> I dont even see a place in the codebase that defines this flag.
>>>>
>>>> I see nothing related to mm, nor anything outdated in my Make.defs,
>>>> which is from my old setup, yes, but still similar to a recent one.
>>>>
>>>> Sebastien
>>>>
>>>> Le 26/05/2021 à 18:08, raiden00pl a écrit :
>>>>> If you use CONFIG_BUILD_FLAT=y, make sure that __KERNEL__ flag is
>>>>> set here:
>>>>> https://github.com/apache/incubator-nuttx/blob/master/include/nuttx/mm/mm.h#L85
>>>>>
>>>>>
>>>>> I remember that at some point I had a similar hardfault in mm which
>>>>> doesn't
>>>>> make sense and it was due to outdated board Make.defs.
>>>>>
>>>>> śr., 26 maj 2021 o 17:21 Sebastien Lorquet <sebast...@lorquet.fr>
>>>>> napisał(a):
>>>>>
>>>>>> Update: stack dump and register analysis are in fact pointing to a
>>>>>> crash
>>>>>> in mm_alloc
>>>>>>
>>>>>> I have enabled memory management debug:
>>>>>>
>>>>>> mm_initialize: Heap: start=0x10000000 size=65536
>>>>>> mm_addregion: Region 1: base=0x10000154 size=65184
>>>>>> stm32_netinitialize: Enabling PHY power
>>>>>> stm32_netinitialize: PHY reset...
>>>>>> stm32_netinitialize: PHY reset done.
>>>>>> stm32_netinitialize: Configuring PHY int
>>>>>> F
>>>>>> mm_free: Freeing 0x70fb460b
>>>>>> irq_unexpected_isr: ERROR irq: 3
>>>>>> up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 50
>>>>>> up_registerdump: R0: 00000001 2000737c c00000f2 08000101 00000000
>>>>>> 00000000 00000000 200073c8
>>>>>> up_registerdump: R8: 00000000 00000000 00000000 00000000 00000000
>>>>>> 200073c8 080126ad 080126f8
>>>>>> up_registerdump: xPSR: 21000000 PRIMASK: 00000000 CONTROL: 00000000
>>>>>> up_registerdump: EXC_RETURN: fffffff9
>>>>>> up_dumpstate: sp:         200072c8
>>>>>> up_dumpstate: stack base: 20007078
>>>>>> up_dumpstate: stack size: 00000400
>>>>>>
>>>>>> The fact that mm_initialize only shows one region is weird...
>>>>>> where is
>>>>>> the heap for the main RAM at 0x20000000?
>>>>>>
>>>>>> the mm_free(0x70fb460b) is not what causes the hardfault (it comes
>>>>>> later), but what the hell is is this invalid address!
>>>>>>
>>>>>> This is the first call to mm_free, here is the backtrace:
>>>>>>
>>>>>> Breakpoint 1, mm_free (heap=0x200060b4 <g_mmheap>, mem=0x70fb460b) at
>>>>>> mm_heap/mm_free.c:85
>>>>>> 85        if (!mem)
>>>>>> (gdb) bt
>>>>>> #0  mm_free (heap=0x200060b4 <g_mmheap>, mem=0x70fb460b) at
>>>>>> mm_heap/mm_free.c:85
>>>>>> #1  0x0801264a in mm_free_delaylist (heap=0x200060b4 <g_mmheap>) at
>>>>>> mm_heap/mm_malloc.c:82
>>>>>> #2  0x08012672 in mm_malloc (heap=0x200060b4 <g_mmheap>, size=24) at
>>>>>> mm_heap/mm_malloc.c:115
>>>>>> #3  0x08012a32 in mm_zalloc (heap=0x200060b4 <g_mmheap>, size=24) at
>>>>>> mm_heap/mm_zalloc.c:45
>>>>>> #4  0x080123ac in zalloc (size=24) at umm_heap/umm_zalloc.c:68
>>>>>> #5  0x080399fa in inode_alloc (name=0x8059a78 "") at
>>>>>> inode/fs_inodereserve.c:78
>>>>>> #6  0x08039a5c in inode_root_reserve () at
>>>>>> inode/fs_inodereserve.c:129
>>>>>> #7  0x080398cc in inode_initialize () at inode/fs_inode.c:92
>>>>>> #8  0x08039284 in fs_initialize () at fs_initialize.c:47
>>>>>> #9  0x08007eb4 in nx_start () at init/nx_start.c:600
>>>>>> #10 0x0800421e in __start () at chip/stm32_start.c:338
>>>>>>
>>>>>> As previously analyzed, this happens in fs_initialize through
>>>>>> inode_root_reserve, so I was on the right track.
>>>>>>
>>>>>> Caller shows mm_free called with that weird address:
>>>>>>
>>>>>> (gdb) f 1
>>>>>> #1  0x0801264a in mm_free_delaylist (heap=0x200060b4 <g_mmheap>) at
>>>>>> mm_heap/mm_malloc.c:82
>>>>>> 82            mm_free(heap, address);
>>>>>> (gdb) list
>>>>>> 77
>>>>>> 78            /* The address should always be non-NULL since that was
>>>>>> checked in the
>>>>>> 79             * 'while' condition above.
>>>>>> 80             */
>>>>>> 81
>>>>>> 82            mm_free(heap, address); <-- address == 0x70fb460b
>>>>>> 83          }
>>>>>> 84      #endif
>>>>>> 85      }
>>>>>> 86
>>>>>>
>>>>>> (gdb) print &g_mmheap
>>>>>> $3 = (struct mm_heap_s *) 0x200060b4 <g_mmheap>
>>>>>> (gdb) print g_mmheap
>>>>>> $4 = {mm_impl = 0x0}
>>>>>>
>>>>>> this is not good!
>>>>>>
>>>>>> This is not a timing or IRQ related issue but a heap issue.
>>>>>>
>>>>>> R15 = 080126f8 translates to here:
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-nuttx/blob/master/mm/mm_heap/mm_malloc.c#L199
>>>>>>
>>>>>>
>>>>>>
>>>>>> => this free() has corrupted a badly initialized heap, and the next
>>>>>> malloc fails, giving a hardfault because that address is invalid.
>>>>>>
>>>>>> Horrific mess!
>>>>>>
>>>>>> ==>
>>>>>>
>>>>>> I think that my old board code does not initialize the board
>>>>>> properly, I
>>>>>> probably have to check for differences between my code and the
>>>>>> stm32f429i-disco built-in board (on which I based my board).
>>>>>>
>>>>>> Sebastien
>>>>>>
>>>>>> Le 25/05/2021 à 21:26, Nathan Hartman a écrit :
>>>>>>> On Tue, May 25, 2021 at 12:02 PM Sebastien Lorquet
>>>>>>> <sebast...@lorquet.fr
>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Back to the business
>>>>>>>>>> After this we managed to recompile our project using the
>>>>>>>>>> latest NuttX
>>>>>>>>>> sources, but it fails when trying to init the PHY irq on our
>>>>>>>>>> STM32F427
>>>>>>>>>> board: We get "unexpected IRQ".
>>>>>>>>>>
>>>>>>>>>> Yes I know that's pretty vague :-)
>>>>>>>>>>
>>>>>>>>>> Is there anything obvious I should have been careful with in this
>>>>>>>>>> domain, before I dig the jtag probe to fix it (tomorrow) ?
>>>>>>>>> I would first start by looking through the Release Notes
>>>>>>>>> between v7.30
>>>>>>>>> and v10.1. Many big improvements and bug fixes happened and
>>>>>>>>> some of
>>>>>>>>> them are mentioned in Compatibility Concerns along with some
>>>>>>>>> changes
>>>>>>>>> you might need to make to configuration etc.
>>>>>>>>>
>>>>>>>>> Also another thing you can try: Has this board and PHY worked
>>>>>>>>> correctly with v7.30? If so, you can bisect and with very few
>>>>>>>>> tests
>>>>>>>>> (I'm guessing fewer than 20) find the exact commit that broke it.
>>>>>>>> Release notes are hard to read but I did not find anything
>>>>>>>> special about
>>>>>>>> phy interrupts.
>>>>>>>>
>>>>>>>> Note that it may not be the phy interrupt. Here is my log:
>>>>>>>>
>>>>>>>> stm32_netinitialize: Enabling PHY power
>>>>>>>> stm32_netinitialize: PHY reset...
>>>>>>>> stm32_netinitialize: PHY reset done.
>>>>>>>> stm32_netinitialize: Configuring PHY int
>>>>>>>> F
>>>>>>>> irq_unexpected_isr: ERROR irq: 3
>>>>>>>> up_assert: Assertion failed at file:irq/irq_unexpectedisr.c
>>>>>>>> line: 50
>>>>>>>> up_registerdump: R0: 00000001 2000737c c00000f2 08000101 00000000
>>>>>>>> 00000000 00000000 200073c8
>>>>>>>> up_registerdump: R8: 00000000 00000000 00000000 00000000 00000000
>>>>>>>> 200073c8 080126ad 080126f8
>>>>>>>> up_registerdump: xPSR: 21000000 PRIMASK: 00000000 CONTROL: 00000000
>>>>>>>> up_registerdump: EXC_RETURN: fffffff9
>>>>>>>>
>>>>>>>> A lot of OS initialization things happen at the point, marked by
>>>>>>>> the
>>>>>>>> letter F.
>>>>>>>>
>>>>>>>> It seems that an unexpected IRQ happens in this interval, around
>>>>>>>> the
>>>>>>>> time the filesystem is initialized. The backtrace goes down to
>>>>>>>> memory
>>>>>>>> allocation routines through the initialization of the root inode.
>>>>>>>>
>>>>>>>> My guess is that AN external IRQ is triggered (possibly not the
>>>>>>>> PHY IRQ)
>>>>>>>> but the ISR handler for that one is not ready yet. I will add debug
>>>>>>>> messages.
>>>>>>>>
>>>>>>>>
>>>>>>>> I would expect that situation to be a simple NOP, but it seems that
>>>>>>>> undefined handlers are set to this function "irq_unexpected_isr"
>>>>>>>>
>>>>>>>> Is that a new behaviour? a default config that I did not set
>>>>>>>> properly
>>>>>>>> when porting our old defconfig?
>>>>>>>>
>>>>>>>> Sebastien
>>>>>>>>
>>>>>>>>> Nathan
>>>>>>> Did you try disabling the PHY (or networking) in Kconfig to see if
>>>>>> removing
>>>>>>> it from the build will eliminate the hardfault?
>>>>>>>
>>>>>>> Have you seen this about hardfault debugging:
>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=139629445#content/view/139629445
>>>>>>
>>>>>>
>>>>>>> Nathan
>>>>>>>
>

Re: Port of project from NuttX 7.30 to 10.1 RC1: Unexpected IRQ

Reply via email to