Re: [libav-devel] libav-devel Digest, Vol 47, Issue 37

Julius Friedman Tue, 20 Jan 2015 18:18:12 -0800

Sorry I was quite busy the past two days,but here is what I can gather by
my rough estimation as to why the library gets stuck...


The problem is easily reproducible from server to client and also effects
cases when the logic is used to handle messages inbound in server
implementations.

The loop initiated prior @ 1096 is executed `forever` and starts the
problem, if none of the required data is found within a few hundred reads
the context switching here becomes very bad and the CPU usage exhibited
made be look into this further and what was going on and how to replicate
it.

I have since been able to craft a message using my server implementation
which does replicate the behavior reliably through VLC and ffplay which I
outlined previously.

I have also outlined a few other issues I saw when testing some niche case
scenarios in my server such as long running requests which is how I can to
the stuff about polling for reading, none the less I will try to explain
the best I can but the issue is fairly obvious if you just feed the message
to VLC and step through the response also.

1096
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1097>
for (;;) {
1097
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1097>
ret = ffurl_read_complete(rt->rtsp_hd, &ch, 1);
1098
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1098>
av_dlog(s, "ret=%d c=%02x [%c]\n", ret, ch, ch);
1099
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1099>
if (ret != 1)
1100
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1100>
return AVERROR_EOF;
1101
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1101>
if (ch == '\n')
1102
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1102>
break;
1103
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1103>
if (ch == '$') {
1104
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1104>
/*
XXX: only parse it if first char on line ? */
1105
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1105>
if (return_on_interleaved_data) {
1106
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1106>
return 1;
1107
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1107>
} else
1108
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1108>
ff_rtsp_skip_packet(s);
1109
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1109>
} else if (ch != '\r') {
1110
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1110>
if ((q - buf) < sizeof(buf) - 1)
1111
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1111>
*q++ = ch;
1112
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1112>
}
1113
<http://git.libav.org/?p=libav.git;a=blob;f=libavformat/rtsp.c;h=2200f6ec0709ebd1167d271e165b2420a68ccaa1;hb=HEAD#l1113>
}


Given such a response from a server as I indicated previously the client
received the message and begins to parse and read a byte at a time, if a
byte can be read based on the byte read from the socket it is determined
what action to take.

It seems there are quite a few points of failure but I will outline what my
'issue' is below:

There are two interesting and reliable cases for this logic to be observed:

1) When the server sends back an encapsulated RTP or RTCP packet before the
'PLAY' response is received, the 'return_on_interleaved_data' check to be
false, an attempt an attempt to skip the rtsp packet is made, since this is
not framed packet the length is incorrectly read by ffmpeg and rtsp data is
skipped, the loop reads again and continues to count lines until a
terminating sequence is found such as '\r\n' or '$' is encountered which
may never happen.

2) At any point during interleaved communication when a 'tcp
re-transmission occurs' in which contains the payload section starts with
'$' as the byte read from the underlying socket but not necessarily the
first by in the PDU's of the TCP packet.

The client will either 'return 1' if 'return_on_interleaved_data' is set at
which point the next subsequent read will attempt to perform this same
logic which eventually tries to skip the packet and becomes like situation
1 anyway.

What I also observe occurring during this time is that the source sending
to ffmpeg happens to also re-transmit data more frequently than my client
for what I can only assume is the way the sending is occurring and receives
are happening.

I notice this seems to occur a lot with interleaved 'RTSP' and it seems one
possible reason for this is as follows:

That because the receive buffer is full an 'ack' occurs for a segment which
is only partially transmitted and because when the receive occurs only a
single byte is read and not the entire segment this happens sometimes
multiple times for the same pdu.

E.g here is how I think about it and why I recommend certain socket flags.

The above logic ensures this occurs multiple times and another receive
occurs which further fills the sockets buffer with the remainder of the
data from a previous transmission increasing what is available to read on
the socket with (FION_READ) but not consuming it.

If you need an easy way to visual this you can just send a lot of data to
yourself with a tcp socket and receive 1 byte at a time (or less then what
was sent) to see the resulting TCP level traffic which occurs as as a
result of this and how it can be similar to what I am describing above.

It would seem this happens in RTSP more so then other types of connections
because of how clients need to interpret the application header which
contains '$' and then uses the next octets to determine channel and length
when the data may be part of the payload of a previous partial segment.

This means that clients are usually not receiving whole segments but only
partial segments and then determining how to handle the PDU contents which
means they should also be setting NO_DELAY and TCP_USER_TIMEOUT if
available which most are not.

If you want to find a way to reproduce it this with ffmpeg it seems to
occurs most often when also trying to receive from the same socket as
sending, a collision occurs and repeated polling for a read which prevents
the write from actually occurring or occurring completely which is able to
be reproduced by making a 'GET_PARAMETER' when interleaving to a server at
the same time.

E.g. when using  int rtsp_send_cmd_with_content_async while at the same
time as reading with ff_rtsp_read_reply

Hence why I was talking about the polling which comes from 'rtsp.c' at
udp_read_packet
among other places (where it probably should have been defined in a way
which allowed it's logic to be re-used without being re-defined, but none
the less) what I meant is that functions like

ff_rtp_send_rtcp_feedback

Don't poll for write, the underlying result is that whatever is using
ffmpeg e.g. VLC starts to think it can't write to the rtsp socket because
the socket is latent or failed when in fact the socket is just being used
by ffmpeg already for another send operation and the 'pts_delay' becomes
increased and is never reset lower and additionally may take a long time to
timeout.

This can resolve itself with time as any connection issue, however the
problem is that the timeout is never adjusted again when writing to decease
the back-off which occurred when reading timed out and subsequently causes
the library to react to situations where there is no more legitimate data
from a sender for a small period of time with a large poll delay which will
cause latency if no data is received during the adjusted timeout and could
cause the connection to timeout because:

1) Rtsp data can't be sent (GET_PARAMETER) because a `read` is
already occurring

2) RTCP data can't sent because a `read` is already occurring

This `read` is either from a RTSP request outbound or incoming RTSP message
or RTP Interleaved data on the same file descriptor and not connection
related but there is no way to tell without determining if there is first
an outbound connection or of the socket can be written to with 0 byte data.

The bottom line is that a server can reliably cause a the rtsp client to
enter a state where is consumes more data than it should and never returns
control until the underlying connection is aborted and if a server uses the
code to process messages then it can also be exploited by the message
problems cites above.

The re-transmission issue is more or less a by-product of the above also
IMHO but I would glad to hear what you think anyway.

So in short the I guess the problem can be simplified to

1) "Rtsp parsing logic is incorrect when '$' appears"
And
2) "RtspClient does not properly share resources concurrently"

but i'm not sure that states the seriousness of the issue en toto,
hopefully I have provided enough information.

If you need anymore information just let me know!

Sincerely,
Julius
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] libav-devel Digest, Vol 47, Issue 37

Reply via email to