Sorry I was quite busy the past two days,but here is what I can gather by my rough estimation as to why the library gets stuck...
The problem is easily reproducible from server to client and also effects cases when the logic is used to handle messages inbound in server implementations. The loop initiated prior @ 1096 is executed `forever` and starts the problem, if none of the required data is found within a few hundred reads the context switching here becomes very bad and the CPU usage exhibited made be look into this further and what was going on and how to replicate it. I have since been able to craft a message using my server implementation which does replicate the behavior reliably through VLC and ffplay which I outlined previously. I have also outlined a few other issues I saw when testing some niche case scenarios in my server such as long running requests which is how I can to the stuff about polling for reading, none the less I will try to explain the best I can but the issue is fairly obvious if you just feed the message to VLC and step through the response also. 1096 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1097> for (;;) { 1097 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1097> ret = ffurl_read_complete(rt->rtsp_hd, &ch, 1); 1098 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1098> av_dlog(s, "ret=%d c=%02x [%c]\n", ret, ch, ch); 1099 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1099> if (ret != 1) 1100 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1100> return AVERROR_EOF; 1101 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1101> if (ch == '\n') 1102 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1102> break; 1103 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1103> if (ch == '$') { 1104 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1104> /* XXX: only parse it if first char on line ? */ 1105 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1105> if (return_on_interleaved_data) { 1106 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1106> return 1; 1107 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1107> } else 1108 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1108> ff_rtsp_skip_packet(s); 1109 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1109> } else if (ch != '\r') { 1110 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1110> if ((q - buf) < sizeof(buf) - 1) 1111 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1111> *q++ = ch; 1112 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1112> } 1113 <http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1113> } Given such a response from a server as I indicated previously the client received the message and begins to parse and read a byte at a time, if a byte can be read based on the byte read from the socket it is determined what action to take. It seems there are quite a few points of failure but I will outline what my 'issue' is below: There are two interesting and reliable cases for this logic to be observed: 1) When the server sends back an encapsulated RTP or RTCP packet before the 'PLAY' response is received, the 'return_on_interleaved_data' check to be false, an attempt an attempt to skip the rtsp packet is made, since this is not framed packet the length is incorrectly read by ffmpeg and rtsp data is skipped, the loop reads again and continues to count lines until a terminating sequence is found such as '\r\n' or '$' is encountered which may never happen. 2) At any point during interleaved communication when a 'tcp re-transmission occurs' in which contains the payload section starts with '$' as the byte read from the underlying socket but not necessarily the first by in the PDU's of the TCP packet. The client will either 'return 1' if 'return_on_interleaved_data' is set at which point the next subsequent read will attempt to perform this same logic which eventually tries to skip the packet and becomes like situation 1 anyway. What I also observe occurring during this time is that the source sending to ffmpeg happens to also re-transmit data more frequently than my client for what I can only assume is the way the sending is occurring and receives are happening. I notice this seems to occur a lot with interleaved 'RTSP' and it seems one possible reason for this is as follows: That because the receive buffer is full an 'ack' occurs for a segment which is only partially transmitted and because when the receive occurs only a single byte is read and not the entire segment this happens sometimes multiple times for the same pdu. E.g here is how I think about it and why I recommend certain socket flags. The above logic ensures this occurs multiple times and another receive occurs which further fills the sockets buffer with the remainder of the data from a previous transmission increasing what is available to read on the socket with (FION_READ) but not consuming it. If you need an easy way to visual this you can just send a lot of data to yourself with a tcp socket and receive 1 byte at a time (or less then what was sent) to see the resulting TCP level traffic which occurs as as a result of this and how it can be similar to what I am describing above. It would seem this happens in RTSP more so then other types of connections because of how clients need to interpret the application header which contains '$' and then uses the next octets to determine channel and length when the data may be part of the payload of a previous partial segment. This means that clients are usually not receiving whole segments but only partial segments and then determining how to handle the PDU contents which means they should also be setting NO_DELAY and TCP_USER_TIMEOUT if available which most are not. If you want to find a way to reproduce it this with ffmpeg it seems to occurs most often when also trying to receive from the same socket as sending, a collision occurs and repeated polling for a read which prevents the write from actually occurring or occurring completely which is able to be reproduced by making a 'GET_PARAMETER' when interleaving to a server at the same time. E.g. when using int rtsp_send_cmd_with_content_async while at the same time as reading with ff_rtsp_read_reply Hence why I was talking about the polling which comes from 'rtsp.c' at udp_read_packet among other places (where it probably should have been defined in a way which allowed it's logic to be re-used without being re-defined, but none the less) what I meant is that functions like ff_rtp_send_rtcp_feedback Don't poll for write, the underlying result is that whatever is using ffmpeg e.g. VLC starts to think it can't write to the rtsp socket because the socket is latent or failed when in fact the socket is just being used by ffmpeg already for another send operation and the 'pts_delay' becomes increased and is never reset lower and additionally may take a long time to timeout. This can resolve itself with time as any connection issue, however the problem is that the timeout is never adjusted again when writing to decease the back-off which occurred when reading timed out and subsequently causes the library to react to situations where there is no more legitimate data from a sender for a small period of time with a large poll delay which will cause latency if no data is received during the adjusted timeout and could cause the connection to timeout because: 1) Rtsp data can't be sent (GET_PARAMETER) because a `read` is already occurring 2) RTCP data can't sent because a `read` is already occurring This `read` is either from a RTSP request outbound or incoming RTSP message or RTP Interleaved data on the same file descriptor and not connection related but there is no way to tell without determining if there is first an outbound connection or of the socket can be written to with 0 byte data. The bottom line is that a server can reliably cause a the rtsp client to enter a state where is consumes more data than it should and never returns control until the underlying connection is aborted and if a server uses the code to process messages then it can also be exploited by the message problems cites above. The re-transmission issue is more or less a by-product of the above also IMHO but I would glad to hear what you think anyway. So in short the I guess the problem can be simplified to 1) "Rtsp parsing logic is incorrect when '$' appears" And 2) "RtspClient does not properly share resources concurrently" but i'm not sure that states the seriousness of the issue en toto, hopefully I have provided enough information. If you need anymore information just let me know! Sincerely, Julius _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
