On Monday, September 3, 2001, at 06:49 PM, Graham Leggett wrote:

Chuck Murcko wrote:

Consider a caching proxy designed to handle streaming media (it need
only know that streaming data differs from other data). Let's say we can
also specify a maximum size (to circular buffer) of this streaming data,
and that we can refer to a connection pool to know we're already
connected to the origin server for it, if that's the case.

Ok (the penny drops)...! :)


Yes, though I suspect this penny drops into module-2.1/ (not yet existing). 8^)


Possible problems with the circular buffer idea:

o Imagine two clients connect to a stream at roughly the same time. The
circular cache kicks in, and the single stream is split and sent to the
two clients. But - one client is on a slow line, the other on a fast
line - eventually the slower client lags so far behind it "falls off"
the tail end of the stream. At this point, what do we do? Force close
the "slow" stream connection?


I should explain that I'm looking at this problem from the standpoint of generic operation. That is, if I hit a point somewhere down the page where a generic solution fails, I'm talking about a protocol-specific streaming media proxy, which needs to be done as a protocol extension to the proxy or a self-contained server. Game over.


I'm thinking it's an interesting question to ask ourselves (for the 2.1 timeframe - this isn't for 2.0) what if anything is there that we can do generically to mod_proxy to make it better able to deal with streams over HTTP?

Well, this is the crux of the streaming proxy problem, the lagging client. Do you block everyone else on the slowest guy? Do you buffer him and continue on with the others? Or do you just decide he's a loser and drop him?

It's important to remember here that we're talking about streaming media via HTTP via TCP. This is data which belongs on a UDP or similar transport, but here we are, for better or worse.

Isn't the input stream going to be arriving at (on average) the required data rate to play the stream?

If so, we can choose to simply write data to all clients at that rate, just using a connection buffer, not a cache, and let fixed size client connection buffers overrun (themselves) if they're too slow. Yes, you get data dropped on the floor and a crappy stream, and yes, you would get it anyway if you were connected directly to the origin server, if you're too slow to keep up with the stream. I'm talking enough buffer per client so they don't do this unnecessarily, just when they can't keep up.

If not, game over.

o How do we detect whether we actually have a stream, and not just a
very big file. Maybe we should have a special cache module called
mod_mp3 that is able to intelligently read the headers of a stream,
split streams, and make sure a new mp3 header is placed on streams that
are chopped in the middle. This could be implemented as a filter/handler
combination.


Do we really need to? We are not permanently caching this anyway, so could we not treat any of this type of traffic the way we'd treat a stream? Most likely that ends up being the same thing we do now for a single client, but with only one origin server connection. We'd need to be more careful about making sure all clients get all data, though.


To do this in, say, a mod_mp3 or mod_qt is a possibility, as you say. Game over for generic.

o How are we to be sure that a client will be able to pick up a stream
in mid flow and not blow up by expecting a header of some kind?


I'd think we'd have an appropriate response header built in the normal way, atop the origin server response header. That would go before we started sending the stream data to the client. If there's a header in the stream that has to be regenerated as well, oh well, game over. We can't support that without knowing the protocol. Mod_mp3.


We should separate the potential cache issues from the proxy issues. It looks like a diminishing return to me to have the cache get involved in this sort of thing, because the more we think of doing with streams, the more likely it is that we will have to know something about the stream protocol, which is game over/mod_mp3. Plus, the benefits of involving the cache are not very clear to me. Connection fanout is a clear advantage that might be possible, generically. That's a purely proxy issue AFAICS. It would use connection pooling, with additional code to do the actual fanout.

If there's a useful set of protocols this simple, generic approach works for, this might be worth considering. I think this set of protocols might include anything which works today through proxy_http.c. But if not, we can forget about this and think in terms of protocol-specific filter/handlers.

Thoughts?

Regards,
Chuck

Reply via email to