So close.  So very close...

I've been expending much effort in fixing up the many suspend/resume issues that have been crippling the poor GTA01. At this point, the major ones are all resolved, but I've encountered a difficult one.

So I have questions. This email is long, so for those with shorter attention spans I'll cut to the chase:

a) Where is the list of known problems with the different GSM modem firmwares, and the list of fixes in each firmware?

b) Is there a document that describes what we (or anyone else, for that matter) know about the problems or unexpected behavior of our version of the S3C2410 SoC UARTs, especially with regard to auto flow control and overruns?

c) What are the real-world parameters for the GSM modem with regard to flow-control:

  1) How much data can it buffer internally?

2) What does it do when its internal buffers fill -- does it throw data away? Does it transmit despite being flow-controlled? Is there an AT command available to alter or control how it behaves? Can it report internal buffer overflows? What could trigger it to transmit despite being flow-controlled?

3) Do different firmware versions for this modem behave differently in regard to flow control, and if so, how so?

d) If I were to acquire a scope, are the CTS, RTS, and other signals between the GSM and the UART accessible on the debug board? Are there pads on the phone itself?

e) Can we do anything with gsmd, and/or phonekit, to work around the problem in the meantime?


See below for details on the issue.


-----------------------------


The device works now - most of the time. Once in a while, when waking from suspend on an incoming SMS or call, the device encounters an overrun error on the serial port.

This is a Very Bad Thing, but it's made even worse since it does not seem as if gsmd can recover, and the app (phonekit?) that runs the show can't seem to recover when gsmd barfs. So this chain of errors leaves the device dead, and you don't know when this has happened.

Here's a trace from some kernel debugging code. This trace shows a good exchange waking to receive an SMS, followed by a failed attempt:

4481.625: ->  .
4481.630: <- ..OK..
4481.635: -> AT+CMGD=1,0.
4481.725: <- ..OK..

Normal communication with the GSM; we send a newline, modem says "OK", we say "AT+CMGD=1,0", modem says "OK" -- you get the idea. (The numbers on the left represent seconds since some distant point in time; resolution is in steps of .005 seconds.)

4511.315: Ps SijSIJG

Normal suspend occurs here - this jibberish means something, but its all good -- this is what we expect and want to see.

4511.325: Pr GSi
4511.420: S  M
4511.420: Pr JSij
4511.420: <- ..%CPRI: 1,2....+CMTI: "SM",1..

Normal resume here - reverse order from the suspend, the "S M" line that is interjected simply marks a debug point I put in that proves that hardware flow control is being turned back on during the resume. So we know that the hardware flow control is enabled during the resume sequence. The last line shows the reason for the resume: we have an incoming SMS.

4513.555: ->  .
4513.560: <- ..OK..
4513.685: -> AT+CMGR=1.
4513.810: <- ..+CMGR: 0,,33..07912160130300F4040B918174570101F00003804091
4513.815: <- 3231010A0FF3701E744787E92C10B94E2EFF00....OK..
4513.935: ->  .
4513.940: <- ..OK..
4513.945: -> AT+CMGD=1,0.
4514.020: <- ..OK..

The SMS is read and handled.  That's all good.

Now compare that to the next entries in the log. We suspend, as before. We resume, as before. The reason for the resume is clear (incoming data from the modem). But notice the "E# O" line -- that signifies that at that point in the data stream the S3C2410 UART reported an Overrun error. The data that follows seems to confirm that -- if you count the characters, you'll come up with 16 characters, and you'll notice that the CMTI message is truncated (one character short; the message is supposed to be terminated by "\r\n", the "\n" is missing (the non-printing characters are replaced by a period in the log, but I went and checked what's really there in the binary data)).

4593.280: Ps SijSIJG
4593.290: Pr GSi
4593.385: S  M
4593.385: Pr JSij
4593.385: <- ..%CPRI: 1,2..
4593.610: E# O
4593.610: <- ..+CMTI: "SM",1.
4704.930: ->  .
4704.935: <- ..OK..
4707.935: -> AT.
4707.940: <- ..OK..

So that's it. At this point, the phone is dead. The GSM is working, and you'll see some data from it from time to time. In fact, if you look at the time stamps, you can see that it appears that gsmd sent something to the device 100 seconds later, and the GSM responded -- but gsmd didn't see it, and terminated; it declared the modem dead! And when that happens, then whatever is running the GUI on the phone reports an I/O error and doesn't even attempt to restart gsmd. Aaargh!! So this simple overrun results in total failure of the phone parts of the device! [Log snippets from both of those will be tacked to the end of this email.]

----

So, first of all we have a major problem down in the kernel -- there's no reason for an overflow to have occurred. Auto-flow-control was enabled, and confirmed to be enabled by reading the status back from the UART (that's the gibberish parts you see up there). We can see exactly where the 16-byte FIFO filled up and the overrun occurred, but there's no reason for it do have done so unless there's a hardware issue with the UART, or there's a bug in the GSM itself.

Second, there's no reason that gsmd can't deal with this. It should have recovered from a short read, and re-established synchronization with the modem (it's not dead -- use "cu" to connect at this point and you can talk to it all day; it's clearly gsmd that got confused and can't talk!).

Thirdly, when gsmd exits, it was my understanding that the dialer or whatever the "Home" application is would restart it; that doesn't seem to happen.

----

Logs follow:

==============================================================

Sat Apr 19 20:49:37 2008 <1> machine_gta01.c:142:atcmd_wakeup_modem() try to wake up Sat Apr 19 20:49:37 2008 <1> atcmd.c:654:atcmd_submit() submitting command `AT+CMGD=1,0' Sat Apr 19 20:49:37 2008 <1> machine_gta01.c:134:wakeup_timer() Create wake up timer
Sat Apr 19 20:49:37 2008 <1> atcmd.c:265:ml_parse() buf=`OK'(2)
Sat Apr 19 20:49:37 2008 <1> atcmd.c:634:remove_timer() Get respond before timeout, remove timer! Sat Apr 19 20:49:37 2008 <1> atcmd.c:227:atcmd_done() Calling final cmd->cb() Sat Apr 19 20:49:37 2008 <1> machine_gta01.c:88:null_wakeup_cb() The wake up callback!!
Sat Apr 19 20:49:37 2008 <1> atcmd.c:237:atcmd_done() Clearing mlbuf
Sat Apr 19 20:49:37 2008 <1> atcmd.c:593:discard_timer() Create discard timer
Sat Apr 19 20:49:37 2008 <1> atcmd.c:265:ml_parse() buf=`OK'(2)
Sat Apr 19 20:49:37 2008 <1> atcmd.c:634:remove_timer() Get respond before timeout, remove timer! Sat Apr 19 20:49:37 2008 <1> atcmd.c:227:atcmd_done() Calling final cmd->cb() Sat Apr 19 20:49:37 2008 <1> usock.c:56:usock_cmd_enqueue() enqueueing usock cmd 0x31ed8 for user 0x31fa8
Sat Apr 19 20:49:37 2008 <1> atcmd.c:237:atcmd_done() Clearing mlbuf
Sat Apr 19 20:49:37 2008 <1> usock.c:1665:gsmd_usock_user_cb() successfully sent cmd 0x31ed8 to user 0x31fa8, freeing
Sat Apr 19 20:51:23 2008 <1> atcmd.c:265:ml_parse() buf=`%CPRI: 1,2'(10)
Sat Apr 19 20:51:23 2008 <1> atcmd.c:334:ml_parse() extd reply `%CPRI: 1,2' to cmd `NONE', must be unsolicited Sat Apr 19 20:51:23 2008 <1> unsolicited.c:70:usock_evt_send() entering evt=13 Sat Apr 19 20:51:23 2008 <1> usock.c:56:usock_cmd_enqueue() enqueueing usock cmd 0x320b0 for user 0x31e70 Sat Apr 19 20:51:23 2008 <1> usock.c:56:usock_cmd_enqueue() enqueueing usock cmd 0x3b4d8 for user 0x31fa8 Sat Apr 19 20:51:23 2008 <1> usock.c:1665:gsmd_usock_user_cb() successfully sent cmd 0x3b4d8 to user 0x31fa8, freeing Sat Apr 19 20:51:23 2008 <1> usock.c:1665:gsmd_usock_user_cb() successfully sent cmd 0x320b0 to user 0x31e70, freeing Sat Apr 19 20:53:13 2008 <1> gsmd.c:116:alive_interval_tmr_cb() interval expired, starting next alive inquiry Sat Apr 19 20:53:13 2008 <1> atcmd.c:651:atcmd_submit() extra-submiting command Sat Apr 19 20:53:13 2008 <1> machine_gta01.c:142:atcmd_wakeup_modem() try to wake up Sat Apr 19 20:53:13 2008 <1> atcmd.c:654:atcmd_submit() submitting command `AT' Sat Apr 19 20:53:13 2008 <1> machine_gta01.c:134:wakeup_timer() Create wake up timer Sat Apr 19 20:53:16 2008 <1> machine_gta01.c:96:wakeup_timeout() Wakeup time out!! Sat Apr 19 20:53:46 2008 <1> gsmd.c:75:alive_tmr_cb() gsmd_alive timer expired
Sat Apr 19 20:53:46 2008 <8> gsmd.c:78:alive_tmr_cb() modem dead!


==============================================================

** (neod:2470): DEBUG: mainmenu powersave reset
** (neod:2470): DEBUG: mainmenu powersave reset
** (neod:2470): DEBUG: mainmenu powersave reset
** (neod:2470): DEBUG: mainmenu powersave reset
** (neod:2470): DEBUG: mainmenu powersave reset

(openmoko-messages:2760): Pango-WARNING **: pango_layout_set_markup_with_accel: Error on line 1: Character ' ' is not valid at the start of an entity name; the & character begins an entity; if this ampersand isn't supposed to be an entity, escape it as &amp;
** (matchbox-panel-2:2474): DEBUG: battery_applet: signal_filter
** (matchbox-panel-2:2474): DEBUG: (unknown dbus message, ignoring)
** (matchbox-panel-2:2474): DEBUG: usb_applet: signal_filter
** (matchbox-panel-2:2474): DEBUG: (unknown dbus message, ignoring)
** (neod:2470): DEBUG: power pressed for 0
** (neod:2470): DEBUG: mainmenu powersave reset
** (neod:2470): DEBUG: requested to set display brightness 100
** (neod:2470): DEBUG: writing '5156' to '/sys/class/backlight/gta01-bl/brightness'
** (neod:2470): DEBUG: active Window = 20971523 ('Messages')

(neod:2470): Gtk-CRITICAL **: gtk_window_is_active: assertion `GTK_IS_WINDOW (window)' failed
** (matchbox-panel-2:2474): DEBUG: CIPHER event: data = 62944, 2, 0
** (matchbox-panel-2:2474): DEBUG: gsm_applet_update_cipher_status: status = 62944
** (matchbox-panel-2:2474): DEBUG: battery_applet: timeout
** (matchbox-panel-2:2474): DEBUG: battery_applet: timeout
** (matchbox-panel-2:2474): DEBUG: battery_applet: timeout

** (phone-kit:2472): WARNING **: Unhandled IO condition, bailing

==============================================================

Reply via email to