Hello project followers, I got an update: I found and fixed the issue that caused our experimental fw to crash on boot when the modem is still "hot" from a previous power cycle, i.e., has only been off for a few seconds rather than minutes.
The issue was a pure firmware bug, nothing to blame on the hardware. The GSM protocol stack implementation we are using is built atop Condat's framework called GPF, which is a layer above or a wrapper around Nucleus. GPF manages several dynamic memory allocation pools, both the fully dymanic kind (like classic malloc) and the "partition" kind that cannot get fragmented because all allocation and freeing is done in terms of preset partitions. Each memory pool (either DM or PM) needs a Nucleus control block, and GPF's OS layer (the one which I had to reconstruct from disassembly of binary objects around this time a year ago) is responsible for calling NU_Create_Memory_Pool() or NU_Create_Partition_Pool() to initialize these control blocks. The trouble happened because these control blocks are themselves allocated dynamically - the very first dynamic memory pool's control block is statically allocated in the bss segment, and all subsequently needed ones are allocated from that first pool. And it just so happens that Nucleus includes an "error check" function whereby if a control block passed to a NU_Create_<whatever>() function already has its signature word filled in, the "create" function fails on the assumption that someone tried to re-initialize an already active object. The power cycle dependency happened because of data retention in the external SRAM - the one inside the weird Samsung K5A3281 flash+RAM chip used in Openmoko's modem. If the modem hasn't been powered off long enough, this SRAM will retain (some of) its content from the previous power cycle - and if enough bits have survived such that the memory where the control blocks in question happen to be allocated retains the "magic" signature word values (32 bits in each control block), the new boot cycle crashes spectacularly as the firmware fails to properly create its GPF memory management structures. The issue never occurred with TI's original TCS211 firmware because they put the "raw" memory for the pools in the bss segment which is explicitly zeroed out by the firmware's early init code on every boot. But in our FreeCalypso fw I moved those "raw" memory chunks into separate int.ram and ext.ram sections which are not part of the zeroed-out bss segment: I figured why waste CPU cycles zeroing out memory whose initial content is not supposed to be depended on... My current fix: I added a bzero() call to zero out just these specific control blocks right before passing them to NU_Create_<xxx>_Pool() functions. With this fix, the "hot modem" boot crash no longer occurs. An effective test is to send a tgtreset command to a running fw via fc-tmsh: it reboots the Calypso via its watchdog timer without cycling the power to anything, and prior to the fix I just made, it reliably caused the boot crash to occur. I haven't done any further troubleshooting on the other issue yet - AT+COPS failing to connect to the GSM network - I'm going to take another look at it now. Happy hacking, Mychaela aka Space Falcon _______________________________________________ Community mailing list Community@freecalypso.org https://www.freecalypso.org/mailman/listinfo/community