> On Mon, Jan 27, 2014 at 5:26 PM, Michal Rybárik <[email protected]> wrote: > > Hi Pavel, > > > > thank you for an answer - it inspired me a lot, and we're now much closer to > > the resolution (I hope). It seems that there is something wrong with memory > > allocation for RTP frames (probably res_rtp_asterisk.c). I explain details > > below, and I hope that one of Asterisk gurus will help us. > > > > First I have to correct something I wrote before. Frame with src=RTP, which > > caused segfault, didn't come from DAHDI, it came from IP network (SIP). I > > verified it also by dropping udp rtp packets on the network - RTP frames in > > V.21 detection function then disappeared too. I'm not sure, but it seems > > that frames from network are stored into memory using res_rtp_asterisk.c > > module (or something very related to it) and probably there lives our bug. > > > > You was right when you wrote, that there's something bad with datalen. As I > > know, a-law sample is 13bit integer, stored usually into 16bit integer for > > easier manipulation. We cannot store 13bit integer into 8bit integer without > > loosing information. Also libspandsp is expecting 16bit samples for V.21 > > detection. Asterisk module res_fax_spandsp calls spandsp function > > modem_connect_tones_rx() which is declared as: > > int modem_connect_tones_rx(modem_connect_tones_rx_state_t *s, const > > int16_t amp[], int len) > > where "amp" is array of 16-bit integers (samples), and "len" is number of > > samples (not number of bytes!!!!), as you can see from > > modem_connect_tones_rx() source code. When Asterisk pass "amp" pointer to > > modem_connect_tones_rx() together with "len" = 160, libspandsp will read > > 16-bit integer 160-times, staring from the pointer address, so it will read > > 320 bytes. > > > > Let's look again on ast_frame which caused segfault: > > - frametype = 2 > > - datalen = 160 > > - samples = 160 > > - mallocd = 1 > > - mallocd_hdr_len = 562 > > - offset = 64 > > - src = RTP > > - flags = 1 > > - ts = 9140 > > - len = 20 > > - seqno = 1489 > > - data.ptr = 0xb4ef4f30 > > > > I am not sure about mallocd_hdr_len and other values, but I think that 160 > > bytes space (datalen) is _definitely_ not enough for 160 alaw/slin samples. > > > > As I know, segfault happens when application tries to access memory, which > > doesn't belong to it. If we have 160bytes allocated, and we will try to read > > 320bytes from this memory, we'll probably read also something else, what we > > didn't expect. If this memory space is on the border of application memory > > region, we could be trying to read from memory which does not belong to this > > application - and this will cause segfault. Definitely. > > > > So now it seems, that problem is not in res_fax_spandsp, neither in > > libspandsp, but somewhere in the Asterisk, where memory for RTP frames > > (coming from IP network) is allocated. > > > > In res_rtp_asterisk, the packet is read from the socket in > ast_rtp_read. This is also the place where the read data is converted > into a frame of the appropriate type. The payload for the voice frame > is obtained in the actual RTP packet itself: > > rtp->f.src = "RTP"; > rtp->f.mallocd = 0; > rtp->f.datalen = res - hdrlen; > rtp->f.data.ptr = rtp->rawdata + hdrlen + AST_FRIENDLY_OFFSET; > rtp->f.offset = hdrlen + AST_FRIENDLY_OFFSET; > rtp->f.seqno = seqno; > > Later on, an ast_frdup most likely pulled the frame off of the rtp > engine and re-malloc'd it; however, this is code that happens all the > time and wouldn't have perturbed the datalen. Your problem is most > likely coming from whatever RTP packet was sent to Asterisk. You may > want to look at a pcap of the message traffic that caused the problem > to determine what about the packet is causing the issue. > > It may be necessary for something either in res_rtp_asterisk or > res_fax_spandsp to verify that the number of samples in the RTP packet > (or what the voice frame has in it before it gets handed off) matches > what is expected. Hi Matthew, thanks for another part of the mosaic, it seems to be more and more complete and starts forming a picture :-). Would it be possible, that the peer causing the segfault uses 10ms packetization time instead of 20ms, which is required for correct operation by libspandsp ? What does Asterisk in cases, when the peer allows 10ms packetization only, while we are using 20ms ? Will it do a conversion, catching two packets and presenting them as one bigger, or will it just forward the short packets to the application ? If the second answer is correct (which I belive is true), I think we have a candidate case for the crash - it's caused by someone who sends 10ms packets instead of 20ms ones. But I don't know the details of the RTP engine, whether it will even allow the connection in such a case, or whether it will be handled similarly as failure to find a common codec. In such a case, I can imagine a scenario, that the peer is offering 20ms packetization, but actually sends packets in 10ms one, thus fooling the RTP engine and causing exactly this problem. So, yes, the pcap of such a failure would be really great! As I wrote, I can't generate one, because in my environment, the crash is really VERY rare and the pcap files would probably fill the whole disk and didn't catch anything :-). Maybe Michal will have more luck with this ?
With regards, Pavel > > -- > Matthew Jordan > Digium, Inc. | Engineering Manager > 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA > Check us out at: http://digium.com & http://asterisk.org > -- _____________________________________________________________________ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
