2009/7/9 Rainer Jung <rainer.j...@kippdata.de>:
> On 08.07.2009 15:55, Paul Querna wrote:
>> On Wed, Jul 8, 2009 at 3:05 AM, Graham
>> Dumpleton<graham.dumple...@gmail.com> wrote:
>>> 2009/7/8 Graham Leggett <minf...@sharp.fm>:
>>>> Paul Querna wrote:
>>>>
>>>>> It breaks the 1:1: connection mapping to thread (or process) model
>>>>> which is critical to low memory footprint, with thousands of
>>>>> connections, maybe I'm just insane, but all of the servers taking
>>>>> market share, like lighttpd, nginx, etc, all use this model.
>>>>>
>>>>> It also prevents all variations of the slowaris stupidity, because its
>>>>> damn hard to overwhelm the actual connection processing if its all
>>>>> async, and doesn't block a worker.
>>>> But as you've pointed out, it makes our heads bleed, and locks slow us 
>>>> down.
>>>>
>>>> At the lowest level, the event loop should be completely async, and be
>>>> capable of supporting an arbitrary (probably very high) number of
>>>> concurrent connections.
>>>>
>>>> If one connection slows or stops (deliberately or otherwise), it won't
>>>> block any other connections on the same event loop, which will continue
>>>> as normal.
>>> But which for a multiprocess web server screws up if you then have a
>>> blocking type model for an application running on top. Specifically,
>>> the greedy nature of accepting connections may mean a process accepts
>>> more connections which it has high level threads to handle. If the
>>> high level threads end up blocking, then any accepted connections for
>>> the blocking high level application, for which request headers are
>>> still being read, or are pending, will be blocked as well even though
>>> another server process may be idle. In the current Apache model a
>>> process will only accept connections if it knows it is able to process
>>> it at that time. If a process doesn't have the threads available, then
>>> a different process would pick it up instead. I have previously
>>> commented how this causes problems with nginx for potentially blocking
>>> applications running in nginx worker processes. See:
>>>
>>>  http://blog.dscpl.com.au/2009/05/blocking-requests-and-nginx-version-of.html
>>>
>>> To prevent this you are forced to run event driven system for
>>> everything and blocking type applications can't be run in same
>>> process. Thus, anything like that has to be shoved out into a separate
>>> process. FASTCGI was mentioned for that, but frankly I believed
>>> FASTCGI is getting a bit crufty these days. It perhaps really needs to
>>> be modernised, with the byte protocol layout simplified to get rid of
>>> these varying size length indicator bytes. This may have been
>>> warranted when networks were slower and amount of body data being
>>> passed around less, but I can't see that that extra complexity is
>>> warranted any more. FASTCGI also can't handle things like end to end
>>> 100-continue processing and perhaps has other problems as well in
>>> respect of handling logging outside of request context etc etc.
>>>
>>> So, I personally would really love to see a good review of FASTCGI,
>>> AJP and any other similar/pertinent protocols done to distill what in
>>> these modern times is required and would be a better mechanism. The
>>> implementations of FASTCGI could also perhaps be modernised. Of
>>> course, even though FASTCGI may not be the most elegant of systems,
>>> probably too entrenched to get rid of it. The only way perhaps might
>>> be if a improved version formed the basis of any internal
>>> communications for a completely restructured internal model for Apache
>>> 3.0 based on serf which had segregation between processes handling
>>> static files and applications, with user separation etc etc.
>>
>> TBH, I think the best way to modernize FastCGI or AJP is to just proxy
>> HTTP over a daemon socket, then you solve all the protocol issues...
>> and just treat it like another reverse proxy.  The part we really need
>> to write is the backend process manager, to spawn/kill more of these
>> workers.
>
> Though there is one nice feature in the AJP protocol: since it knows
> it's serving via a reverse proxy, the back end patches some
> communication data like it were the front end. So if the context on the
> back end asks for port, protocol, host name etc. it automatically gets
> the data that looks like the one of the front end. That way cookies,
> self-referencing links etc. work right.
>
> Most of that can be simulated by appropriate configuration with HTTP to
> (yes, there are a lot of proxy options for this), but in AJP its
> automatic. Some parts are not configurable right now, like e.g. the
> client IP. You always have to introduce something that's aware e.g. of
> the X-Forwarded-For header. Another example would be whether the
> communication to the reverse proxy was via https. You can transport all
> that info va custom headers, but the backend usually doesn't know how to
> handle it.

Yes, these are the sort of things which would be nice to be
transparent. Paul's comment is valid though in that HTTP itself could
be used as the protocol. Right now you couldn't do that over a UNIX
socket though for local backend process, and you loose the ability to
feed back error logging into main Apache error logs in a similar local
setup. So, in some respects what I see is where a better FASTCGI is
used for communicating with local processes only. Anything else would
use normal mod_proxy to another server, thus in effect getting rid of
external mode in FASTCGI. For the local stuff, what it then comes down
to is better process management and dealing with running as a distinct
user in a better way. Solving the problem of how to log errors to
distinct error logs in a mass virtual hosting environment would be
good as well.

Graham

Reply via email to