Apache 2.0 for multi protocol usage

Harrie Hazewinkel Wed, 04 Apr 2001 14:58:35 -0700

Hi all,


I have done some research in order to check whether Apache 2.0
would be suitable for multi protocol usage. Therefore, 
tomorrow 9:00 April 5 a BoF (BOF10) is organised to discuss
a proposed design.

The results and a proposed design for Apache is in the attached
document and a server running with this concept is at
http://klomp.covalent.net:8080/ (Ryan Bloom made the patch).

I would like to invite anyone interested to join
in discussing this topic,

Harrie


Apache 2.0 for multi protocol usage

Summary:
An analysis of internet protocol server commonalities and proposals for
enhancing the Apache 2.0 framework for multi protocol support.

Author:
        Harrie Hazewinkel
Acknowledgements:
        Ben Laurie,
        Roy Fielding,
        Greg Stein,
        Ryan Bloom (patch, Apache implementation)

1.  Introduction

One of the promises of the Apache 2.0 design is to be able to go beyond the
HTTP protocol. Some design and implementation is already part of the
Apache 2.0 work to ensure that the server can handle multiple protocols. 
This document analyses that design as the result of a survey and prototyping
effort done on a wide range of popular internet protocols. And secondly,
this document provides a number of small changes to make Apache 2.0 an even
more promising multi protocol server.

After referencing the common infrastructure provided in apache 2.0 currently
against the commonality of the protocols examined it is our conclusion that
Apache 2.0 should provide more of a common infrastructure. In this document
the authors propose a common infrastructure that can support the multi
protocol features better. 

The proposed changes will leverage the existing Apache 2.0 infrastructure
and enhanced it.

This document is setup in three steps: 1) the protocols, there features and
a comparison, 2) an Apache design based upon an abstract model of the
protocols, 3) an Apache implementation design (what needs to be changed in the
infrastructure).

Having said that; since Apache is still biased towards HTTP; not in the least
in terms of terminilogy and variable names. Therefore, the implementation
design (C-code) is also leaning towards HTTP.


2.  Protocol overview.

HTTP is a request-based stateless protocol, therefore, the Apache server relies
on the request record for all protocol related information. This won't work
once the Apache server starts to bring in different protocols. FTP is also
request based, but unlike HTTP it is a stateful protocol. The FTP protocol is
also request-based, but knowledge of previous requests is required.
For example, during an FTP session the information provided by the requests of
the login procedure need to be maintained or the previous 'change directory'
commands imply that the next requests work on the new directory. This type of
information is seen as a layer between connection and the requests.

This section provides a global overview of the protocols investigated. All
these protocols could be implemented as part of the multi protocol Apache
server. For more detailed information of the protocols is refered to the 
Apendix "Protocol descriptions". The selected protocols are those that access
and transfer information resources. Such resources could be files, emails, or
even cgi (dynamic content).

2.1. Common message sequence

This section provides a recognized common message sequence that is applicable
to many protocol.

    1) Accepting the connection

    2) Sending welcome message
        (optional)

    3) Reading request 

    4) Sending response

    5) Close connection


The above schema is seen as the common message sequence of many protocols.
At the moment a connection is accepted, an optional welcome message may be
sent. This depends on the protocol; for instance, HTTP has no welcome
message, but FTP has. All protocols can cycle through multiple request/
response combinations. Also a protocol may produce multiple responses
to a request. In the above schema it is recognized that state information
may be required. This should be maintained by the protocol implementation
and the state depends on the requests current and previously exchanged.

In most cases the first request are to performing authentication for the
connection and may be followed by multiple requests which could be even
concurrent. When a protocol only consists of 1 request/response and is
stateless, each individual request must be authenticated, for instance HTTP.

The common message sequence does not explicitly mention protocols that are
used below the application protocol layer. For instance, security protocols
like TLS and SASL are in this model considered part of the 'accepting the
connection', phase(1). In order to know which state the connection is the
security protocol state information should be stored on the state layer.
This approach will also allow protocols like PPP (a link layer protocol)
to be used by the proposed model. This could be described as stacking
of protocol layers.
However, when security protocols like SASL have explicit requests and
responses types defined at the application protocol, they are considered just
ordinary requests and responses. Also the security protocols, are positioned
within the protocol stack differently, compared to most protocols. Below
is an example of how these protocols are being considered within the
protocol stack.

        application                         application

    |---------|---------|              +-------------------+
    |         |         |              |                   |
    |         |  HTTPS  |              |        HTTP       |
    |         |         |              |                   |
    |  HTTP   +---------|              |         +---------|
    |         |         |   equal to   |         |         |
    |         |   SSL   |              |         |   SSL   |
    |         |         |              |         |         |
    +---------+---------+              +---------+---------+
    |                   |              |                   |
    |        TCP        |              |        TCP        |


In the picture (left) you see a protocol stack where HTTP positioned
on top of TCP and HTTP(S) on top of SSL which is on top of TCP.
(One could argue that HTTPS includes SSL, but for clarity I named it
HTTPS)

>From an application perspective, there is not difference between HTTP
and HTTPS. The application would see HTTPS as HTTP plus SSL, where it
interface to the application protocol layer is HTTP. Therefore, the
protocol stack from the left merges into the protocol stack on the
right. SSL is then not much more as a 'presentation' filter in HTTP.

ALso in our persepctive, HTTP/0.9, HTTP/1.0 and HTTP/1.1 are still
the same protocol. This is already the case in the current Apache.



2.2. Protocol feature comparison

The previous section points out major differences in protocol behaviour and,
therefore, it must be previously defined what association between a connection
endpoint is and the protocol used.

The following table provide a summary of the various features of the
protocols. For more detailed information is referred to Appendix A.

   +----------------------+------+------+------+------+------+------+------+
   | Feature              | HTTP | FTP  | BEEP | POP  | IMAP | RTSP | SNMP |
   +----------------------+------+------+------+------+------+------+------+
   | stateless            | yes  | no   | no   | no   | no   | no   | yes  |
   +----------------------+------+------+------+------+------+------+------+
   | sessions/channels    | no   | yes  | yes  | yes  | yes  | yes  | yes  |
   | (context in which    |      |      |      |      |      |      |      |
   | request sequences    |      |      |      |      |      |      |      |
   | are exchanged)       |      |      |      |      |      |      |      |
   +----------------------+------+------+------+------+------+------+------+
   | sessions in sessions | no   | no   | yes  | no   | yes  | no   | no   |
   +----------------------+------+------+------+------+------+------+------+
   | connection oriented  | may  | yes  | yes  | yes  | yes  | may  | no   |
   | (keeps connection    |      |      |      |      |      |      | (1)  |
   | between requests)    |      |      |      |      |      |      |      |
   +----------------------+------+------+------+------+------+------+------+
   | bi-directional       | no   | no   | yes  | no   | no   | yes  | sort |
   | (client and server   |      |      |      |      |      |      | of   |
   | can issue requests   |      |      |      |      |      |      | (2)  |
   +----------------------+------+------+------+------+------+------+------+
   | data out-of-band (3) | no   | yes  | no   | no   | no   | may  | no   |
   +----------------------+------+------+------+------+------+------+------+
   | welcome message      | no   | yes  | no   | yes  | -    | no   | no   |
   +----------------------+------+------+------+------+------+------+------+
   | have equal requests  | -    | yes  | may  | yes  | yes  | -    | no   |
   | with other protocols |      |      | (5)  |      |      |      |      |
   | (example LIST) (4)   |      | yes  |      | yes  | yes  |      |      |
   +----------------------+------+------+------+------+------+------+------+
   | has alternative      |      |      |      | yes  | yes  |      | yes  |
   | auth. mechanism      |      |      |      |      |      |      |      |
   | as user/password     |      |      |      |      |      |      |      |
   +----------------------+------+------+------+------+------+------+------+
   (1) SNMP is often UDP oriented, but a TCP mapping is expirimental.
   (2) An SNMP agent can send a notification.
   (3) The out-of-band data means that the protocol uses a seperate connection
       entirely dedicated to data transfer. For FTP this is called the
       data connection.
   (4) By the equal requests are ment that parsing of the request-line of
       a request message, it cannot be determined which protocol it is.
       Of course, if protocols have a similar request type the context
       (in which the request type is used) is completely different.
   (5) BEEP exchanges are either MSG/ERR, MSG/RPY, or MSG/ANS

   
2.3. Conclusion

- All protocols have a request/response paradigm.

- All protocols have one or more authentication mechanisms.

- Some protocols have a welcome message and some don't.
  Therefore, Apache WEB server must to dedicate address endpoint (for IPv4; IP
  address and port number) for each protocol. I.e. means protocol sharing of a
  connection is not possible. Many protocols (like FTP, POP) send a welcome
  message as soon a connection has been made. Even if the Apache server could
  generate a common generic welcome message, it could well be a protocol that
  does not allow for a welcome message.

- Some protocols have text-based request information and have common keywords
  (request-types). Also some protocols share commands. I.e., keywords
  indicating the request type. This means that protocol sharing for
  connections is not possible. Therefore, connection endpoints MUST
  be uniquely associated with a perticular protocol. It is not possible
  to determine the protocol based upon the requests invoked over the
  connection. 

- All protocol requests are used to transfer authentication information.
  Some have dedictated requests for authetication, such as FTP, some have it
  inside the request that request information, such as HTTP.

- All protocols show a seperation as 1) connection information, 2) state
  information, 3) protocol requests/responses. Some protocols have the state
  information implicitly.

- A number of common protocol elements are not easily supported in apache
  2.0 - and would be protocol modules would benefit from an extension of
  the current framework. Expanding the infrastructure in apache has
  several benefits; less code to write for a module; easier portability
  and smaller footprint.


3.0. Apache design

This section provides the current design of Apache (section 3.1) and a design
at a conceptual level in order to support multiple protocols in Apache 2.0
and have as much as possible common infrastructure available by Apache for
protocol module implementations.

3.1. The current Apache infrastructure.

The figure below depicts a summary of the current infrastructure of Apache
that is available for protocols. The infrastructure leans haevily onto
the HTTP protocol. In the figure you see (from top to bottom) 1) a server
record that provides information of the virtual host, 2) a connection
record that has information for the connection between the local server and
the remote client, 3) request record that has all the request specific
information including authentication information.

    +------------+ This records must have all the generic information of the
    | server     | service provide on the connection endpoint.
    |            | For instance, server root, access mechanism required.
    |            |
    |            |     +--+--+--+--+--+--+--+--+--+
    |            |---->|  |  |  |  |  |  |  |  |  |server_config
    +------------+     +--+--+--+--+--+--+--+--+--+
          A
          | which server handles this connection??
          |
    +------------+ This records will have information regarding the
    | connection | connection that has been setup.
    |            | For instance, remote address, transport type (TCP/UDP).
    |            |
    |            |
    |            |
    +------------+
          A
          | to which connection belongs the request?
          |
    +------------+ This records has all the request specific information and
    | request    | all protocol state information.
    |            | For instance, method (request type), status (response type),
    |            | username, password.
    |            |     +--+--+--+--+--+--+--+--+--+
    |            |---->|  |  |  |  |  |  |  |  |  |request_config
    +------------+     +--+--+--+--+--+--+--+--+--+

In the above figure one sees array connected to all the three records in which
module specific information is stored.

The figure below provide a setup of the virtual host association with a
connection endpoint. Multiple virtual hosts are connected towards a (may be
multiple) connection endpoint, but they all have the same protocol. 

      +----------+    +----------+
   -->| server1  |--->| server2  |-------+
      | FQDN/IPa |    | FQDN/IPa |       |
      +----------+    +----------+       |
         |               |               |
         V               |               V
   +------------+        |           +----------+
   | connection |<-------+           | server3  |
   | endpoint   |<-------------------| FQDN/IPa |
   +------------+                    +----------+


3.2. Service/Protocol/Connection endpoint.

The Apache server must handle multiple protocols. Each connection endpoint of
the server must be explicitly associated with a particular protocol. Apache
uses the virtual hosting configuration to identify endpoints, so each virtual
host must declare which protocol it uses in order to define the protocol for
its endpoint.

Below is a simplified figure illustrating the layout of the virtual
host configuration data structures for an example configuration.

   +------------+    +----------+    +----------+    +----------+
   | endpoint 1 |--->| server 1 |--->| server 2 |--->| server 3 |
   | address A  |    | name A   |    | name Q   |    | name R   |
   | port 80    |    | HTTP     |    | HTTP     |    | HTTP     |
   +------------+    +----------+    +----------+    +----------+

   +------------+    +----------+
   | endpoint 2 |--->| server 4 |
   | address A  |    | name A   |
   | port 110   |    | POP      |
   +------------+    +----------+

   +------------+    +----------+
   | endpoint 3 |--->| server 5 |
   | address B  |    | name B   |
   | port 110   |    | POP      |
   +------------+    +----------+

   +------------+    +----------+    +----------+
   | endpoint 4 |--->| server 6 |--->| server 7 |
   | address B  |    | name B   |    | name P   |
   | port XX    |    | BEEP     |    | BEEP     |
   +------------+    +----------+    +----------+

   +------------+    +----------+
   | endpoint 5 |--->| server 8 |
   | address A  |    | name A   |
   | port 21    |    | FTP      |
   +------------+    +----------+

   +------------+    +----------+
   | endpoint 5 |--->| server 8 |
   | address B  |    | name B   |
   | port 21    |    | FTP      |
   +------------+    +----------+

        NameVirtualHost A:80
        NameVirtualHost B:XX

        <VirtualHost A:80>
                ServerName A
        </VirtualHost>

        <VirtualHost A:80>
                ServerName Q
        </VirtualHost>

        <VirtualHost A:80>
                ServerName R
        </VirtualHost>

        <VirtualHost A:110>
                ServerName A
                SetProtocol POP
        </VirtualHost>

        <VirtualHost B:110>
                ServerName B
                SetProtocol POP
        </VirtualHost>

        <VirtualHost B:XX>
                ServerName B
                SetProtocol BEEP
        </VirtualHost>

        <VirtualHost B:XX>
                ServerName P
                SetProtocol BEEP
        </VirtualHost>

        <VirtualHost A:21>
                ServerName A
                SetProtocol FTP
        </VirtualHost>

        <VirtualHost B:21>
                ServerName B
                SetProtocol FTP
        </VirtualHost>

The above illustrates a few points:

  * Each connection endpoint is associated with a list of one or more
    servers. For address-based virtual hosting there is only one
    server; for name-based there may be more than one.

  * Multiple protocols may run on different ports on the same address,
    but all the name-based virtual hosts on a given port must run the
    same protocol.

When a connection arrives, Apache initially chooses the server
configuration at the head of the list hanging off the associated
endpoint. This identifies the protocol for that connection. Later on,
when a name has been supplied by the client, the server configuration
is switched to the appropriate structure in the list.

At the moment Apache allows you to set up almost any virtual hosting
configuration, including nonsensical ones like name-based FTP virtual
hosts, or a mixture of HTTP and POP3 on the same port. There should be
some means for Apache to detect and complain about these bad
configurations, however this is not discussed any further here.


3.3. The request processing

The message processing cycle is the core of all protocol entities.
At the momemt a protocol entity accepts a message it starts processing
the required cycle (message handling) for the protocol.

As shown in the previous section 2.0. "Protocol Overview" the various
message sequence sections have shown 3 levels in the protocol cycle;
    1) the connection, this level will have the connection information,
                such as remote address and local address.
    2) the state, this level must have the authentication information and
                the state that is needed by the protocol acting upon the
                connection.
    3) the requests, this level will have the detailed.

As soon a connection is made to the Apache server it must create a connection
record, to maintain the information regarding the connection. After that
it must create a state record, which is still empty. The state record
information needs to be collected then by the individual request that must
be processed. The protocol entity, therefore, must see each request
as the core element. The protocol module handling the connection (and thus
the requests) must keep track of the state in which the protocol is. The state
information must be kept at the state record level. The handling of multiple
request threads or channels over a connection must be provided by the protocol
module by extending the state record.
Within the state record a linked list of request records may be maintained.

Therefore, the information must be presented as follows in the Apache server.
The figure below depicts the various levels in which information may reside.
Each component will have specific associated information. The links between the
components should avoid duplication of information kept during processing a
request. The request is always the event on which the protocol entity acts.
Also eaxh server, state, request, and connection records will have a hook into
a modules array in which every module may store additional information that is
required for that record/level. This for instance already exists as server
config records in Apache 1.3.

    +------------+ This records must have all the generic information of the
    | server     | service provide on the connection endpoint.
    |            | For instance, server root, access mechanism required.
    |            |
    |            |     +--+--+--+--+--+--+--+--+--+
    |            |---->|  |  |  |  |  |  |  |  |  |server_config
    +------------+     +--+--+--+--+--+--+--+--+--+
          A
          | which server handles this connection??
          |
    +------------+ This records will have information regarding the
    | connection | connection that has been setup.
    |            | For instance, remote address, transport type (TCP/UDP).
    |            |
    |            |
    |            |
    +------------+
          A
          | to which connection belongs the state?
          |
    +------------+ This record must have all the information of the authen-
    | state      | tication of the connection that will have multiple requests.
    |            | For instance, user name, password, authentication type.
    |            |
    |            |     +--+--+--+--+--+--+--+--+--+
    |            |---->|  |  |  |  |  |  |  |  |  |state_config
    +------------+     +--+--+--+--+--+--+--+--+--+
          A
          | to which state belongs the request?
          |
    +------------+ This records should have all information for the perticular
    | request    | request been made.
    |            | For instance, protocol, request type, response type.
    |            |
    |            |     +--+--+--+--+--+--+--+--+--+
    |            |---->|  |  |  |  |  |  |  |  |  |request_config
    +------------+     +--+--+--+--+--+--+--+--+--+

It is recognized that not all protocol specific information can be captured by
the commonly provided request and state records. Therefore, these records are
expanded via a request_config and state_config that allows module to store
module specific data in there 'privately' allocated records. In case of protocol
modules this means that protocol specific information can be stored here when
the common infrastructure does not support its specific protocol features.


3.3.2. IMAP example

Below is an example provided for IMAP, which is a protocol that allows
multiple threads within a single connection. The example is used to
demonstrate the usage and necessity the module specific information records.
The figure below depicts the usage of the state_config to provide every
protocol module an aesier approach to maintain state information. Or if
necessity of multiple states, like IMAP where multiple threads can run on
one authenticated connection.
The following figure depicts a situation in which 2 concurrent threads
handle requests in one state record. The IMAP protocol module specific
part of the state-array must handle the various states of the multiple
threads that may run over the connection. This approach could also be
used by for instance BEEP to manage multiple channels.

       +------------+
       | server     |     +--+--+--+--+--+--+--+--+--+
       |            |---->|  |  |  |  |  |  |  |  |  |server_config
       +------------+     +--+--+--+--+--+--+--+--+--+
             A
             |
             |
       +------------+
       | connection |
       |            |
       +------------+
             A
             |
             |
       +------------+           IMAP
       | state      |     +--+--+--+--+--+--+--+--+--+
       |            |---->|  |  |  |  |  |  |  |  |  |state_config
       +------------+     +--+--+--+--+--+--+--+--+--+
             A    A              |
             |    |              V
             |    |          +--------+   +--------+
             |    |          | thread |-->| thread |
             |    +------+   +--------+   +--------+
             |           |    |            |
             |    +-----------+  +---------+
             |    |      |       |
             |    |      |       |
             |    |      |       V
             |    |    +------------+
             |    |    | request    |     +--+--+--+--+--+--+--+--+--+
             |    |    |            |---->|  |  |  |  |  |  |  |  |  |
             |    |    +------------+     +--+--+--+--+--+--+--+--+--+
             |    V                                       request_config
       +------------+
       | request    |     +--+--+--+--+--+--+--+--+--+
       |            |---->|  |  |  |  |  |  |  |  |  |request_config
       +------------+     +--+--+--+--+--+--+--+--+--+

The approach used in the example where the state_config handles protocol
module specific information should also be applied to the request level.


3.4. Conclusion

The proposed design for Apache 2.0 provides more common infrastructure for
protocol modules. The new introduced layer is used for authetication and
state information of the protocols. In theory all protocols have this, but
in practise it may a zero-layer (empty). This design does not prohibit that.

With the new layer, it also becomes clear that some information is at the
wrong level or HTTP-based. HTTP has all authentication information in the
request record which indeed always is embedded in the HTTP request.
Due to the stateless nature of HTTP this extract abstraction layer was not
required.

The information records:

connection endpoint -- lasts as long as the server is operational.

server record       -- lasts as long as the server is operational.

connection record   -- lasts from time that a connection is accepted until it
                       is closed.

state record        -- lasts from time that a user logs in (or successfully
                       authenticates) until logout. At this level also the
                       protocol state information needs to be maintained.
                       This record must be seperated from the connection
                       record, since a connection can have multiple states
                       (think of threading or user change for the connection).
                       This record is seperated from the request record, since
                       request for a perticular protocol would only change the
                       state of the protocol and authentication information
                       is shared over multiple requests.
                       The authentication information maybe user/password as
                       provided in the requests, but an special authentication
                       may be used. This makes that not always a user and
                       password are available.

request record      -- lasts from time that the request is started until the
                       response is finished.  (this handles push protocols, but
                       that comes later in this document).

The changes needed are seperating the authentication from the request record
and putting this in the state layer.

The lifetime of the individual records is important because of the 'pool'
lifetime as used within Apache.


4. Apache implementation details

This section provides implementation details for the Apache server to
support multiple protocols. The design provided are only for the
connection record, request record and the state record.
These implementation details are seen as the proposal on how to achieve
that the Apache server will be able to provide common infrastructure
for the various protocols that could be implemented.

All three structures will contain a pool, a void * configuration vector for
modules to add their own data. For example, the request record (currently)
has the request_config.
In addition, each structure will have a pointer to it's upper layer, so
request record points to state record, which points to connecction record,
which points to a server record (as shown in previous figures).

Conceptually each structure will have two sections, the first is the protocol
section, where data that is protocol neutral (independent) is stored. The
second is Apache specific information. This section would be where Apache
would store the information that is required for Apache to run correctly. The
goal of the protocol neutral section is to allow any to any protocol
translation.

Protocol Independent   --  Anything that most protocols need
Protocol Specific      --  Anything that one particular protocol needs
                           In this example, that will be HTTP. Ideally, this
                           information should go into the protocol module
                           specific data storage.

NOTE: Most of the fields described below are from a terminology perspective
HTTP biased.


4.1 Conn_rec

The connection record maintains the information of the connection on which
a message exchange takes place. In principal this is completely application
protocol independent. The connection record is envisioned to be on top of the
IP layer or equivalent.


4.1.2  Protocol Independent

    apr_socket_t *client_socket;   /* connection to client */

    /* Information about the client */
    apr_sockaddr_t *local;
    apr_sockaddr_t *remote;
    char *remote_ip;
    char *remote_host;  /* any lookups can obviously lead to multiple names;
                           i.e. if host and IP are in the sense of DNS it
                           might make sense to use that.
                           It may be considered to drop this item. */
                               
    char *local_ip;
    char *local_host;   /* any lookups can obviously lead to multiple names;   
                           i.e. if host and IP are in the sense of DNS it
                           might make sense to use that.   
                           It may be considered to drop this item, since this
                           item should be retrieved from the server record. */

    
    void *vhost_lookup_data;  /* Used for the equivalent of a Host header */
    
    /* XXX on this level for IPv4 it is hard to envision more
     * than a FQDN, (an IP address) and a port number to uniquely
     * pinpoint a machine. Also see above XXX. I.e. if it is
     * truly protocol independent and we stick to IP then there
     * is little room. (Note: would we ever intergrate apache 2.0       
     * into the lower levels of say, kannel.org then this is
     * different of course - but I guess that sort of integration
     * is not realistic.).
     */

    unsigned aborted:1;  /* is the connection still valid? */
    signed int double_reverse:2;

    /* Information about this connection */
    long id;
    void *conn_config

    apr_table_t *notes /* to send notes from one module to another */

    ap_filter_t *input_filters;
    ap_filter_t *output_filters;

    long remain /* used to determine how much data to read from the request
                 * body */

    char *remote_logname;  /* Only set if doing rfc1413 lookups. */
    signed int pipeline         /* Can we pipeline responses?  (Used to be
                                 * called keepalive.)
                                 * -1 fatal error, 0 undecided, 1 yes */        


4.2.2  Protocol Specific  (HTTP)

The following three items are currently in the connection record, but
with the proposed design this items must go into the state_config
record for the HTTP protocol module. They are protocol specific.

    unsigned keptalive          /* Did we use HTTP Keep-Alive? */
    int keepalives              /* how many times have we used it. */

4.3    state_rec

The state record is used to store authentication information as well state
information of the protocol that acts upon the connection.

4.3.1  Protocol Independent

    /* This needs to be abstracted out better, probably with pointers and
     * structures and stuff 
     * XX other common things are an entry for a group or class list. Can
     * also be generalized
     * as simple key/value pairs. But much neater is that idea of
     * using a (few) function pointers to get_id/key/../verify
     * It would be neat to have one generic interface/function to this. 
     * The auth_type then would decided what the real implementation would be.
     * For protocols like SASL we maybe should change the naming of the
     * user,password field.
     */
    char *user;
    char *password;
    char *auth_type;

    const char *hostname;  /* Host as set by full URI or Host: */
    /* XXX some protocols do not quite set the hostname
     *     after the slash - i.e. a phone number URL - but
     *     they all have the concept of a high level host. So
     *    the concepts certainly fits here.
     */
    ap_filter_t *input_filters;
    ap_filter_t *output_filters;

    char *protocol;   /* The well known name for the protocol. */
    int proto_num;    /* The version number of the protocol.   */
    /* The protocol and its version are defined in the state_rec
     * because most protocols do not allow chnages of protocol or
     * not even the version over a connections.
     * However, HTTP does allow this, but for HTTP each requests
     * needs to have a state_rec. This is because of each HTTP
     * request has its own authentication information.
     */

4.4    Request_rec

The request record maintains the information of the request and the associated
response.

NOTE: The proposal of this record still depends for the terminology (field
      names) to much on HTTP.

4.4.1  Protocol Independent

NOTE: Since request information is always protocol specific. The Here
      used 'protocol independent' part is ment to be as common information
      used by requests of the various protocols.

    apr_table_t *headers_in;   /* The in-coming headers */
    apr_table_t *headers_out;  /* The outgoing headers */
    apr_table_t *headers_err;  /* The error headers */

    /* XXX charset often mixed with the type (as MIME puts
     *     them on the same line.
     */
    const char *content_type;

    /* XXX How far in MIME do we want to go; protocols
     *     use them in different ways and most of those
     *     can have multiple defined
     */
    const char *content_encoding;
    apr_array_header_t *content_language;

    int no_cache;   /* May be able to be removed through use of cache filter */

    char *unparsed_uri;
    uri_components parsed_uri;
    char *args;       /* Can we remove this parsed_uri.query*/
    char *uri;        /* ditto              parsed_uri.path*/

    /* XXX are there cases where one would generalize this
     * with an output filter ? 
     */
    const char *response_line; /* Was the status_line */

    const char *request_type;  /* was the method; is also command */
    int method_number(char *); /* This should become a function that 
                                * provide a unique number for the 
                                * request type to be used within a
                                * protocol module. tokenized function??
                                */

    /* XXX is this general - we could use terminology
     *    from the wrec work; in which case you have
     *    originating/non-originating and destined/non
     *    destined. This can also act as a internal
     *    redirect marker.
     */
    int proxyreq;   /* Is this a proxy request? */

    /* These should go into the state layer of the protocol module. */
    request_rec *main, *next, *prev;
        
    apr_time_t request_time; /* time request started */

    int allowed;
    apr_array_header_t allowed_xmethods;
    apr_method_list_t allowed_methods;

    int sent_bodyct;       /* byte count for body data */

    long bytes_sent;       /* body byte count, for easy access */

    apr_time_t mtime;      /* time the resource was last modified */

    long remaining;        /* bytes left to read */
    long read_length;      /* bytes that have already been read */

    const char *boundary;  /* XXX (1) */
    const char *range;
    apr_off_t clength;

    apr_table_t *subprocess_env;   /* env used for any tables */
    apr_table_t *notes;

    const char *handler;   /* The handler string */

    char *filename;        /* XXXX and (1) - is thos not more 
                              something you move to the HTTP
                              protocol (which has the concept
                              of a file with length, size, etc.
                              i.e the URI -> URL  distinction.
                            */
    char *path_info;
    apr_finfo_t finfo;

    void *per_dir_config;

    int eos_sent;
    ap_filter_t *input_filters;
    ap_filter_t *output_filters;

4.4.4  HTTP module specific

    int status;    /* The status of the current request */

    char *the_request;

    int no_local_copy;

    int assbackwards;
    int header_only;

    int chunked;

    int read_body;         /* How the request body should be read */
    int read_chunked;      /* reading chunked data */

    unsigned expecting_100; /* is client waiting for a 100 response? */

    char *vlist_validator; /* variant list validator (if negotiated) */


In HTTP the state and the request recs share the same lifetime, but that is
going to be more the exception instead of the rule. Therefore, the proposed
design strifes to handle the more common case early. This will mean that
the API will change subtle but significantly between b1 and b2. Though for
existing protocol modules the migration is straightforward and the
few record name changes are easily validated by the compiler.


6. Conclusion

The current design of Apache with respect to multi protocol usage does
not provide enough common infrastructure. Most protocol modules must 
implement infrastructure like state information themselves.

Due to the differences and similarities of many protocols, it is not
possible to do protocol sharing over the same connection. WIth protocol
sharing it could well be possible that one cannot determine the protocol
from the request type pending on the connection.

In the current design of Apache the authentication information, such
as username password, are inside the connection request. Protocol modules
must therefore maintain private copies of the authentication information.
The proposed design will provides this as common infrastructure.

The proposed structure would also support requirements like multi-threaded
requests of IMAP or concurrent channels like BEEP that go over one connection.

Obviously with the new structures, some functions will also change APIs.
The obvious change currently is that the AAA hooks will change from using
a request_rec to using a state_rec.
By using encapsulation function via which modules must access this information
it will not changes if the information gets transfered to another level.

Other API changes will need to be evaluated on a case-by-case basis.


-------------------------------------------------------------------------------
Appendix A:  Protocol overview.

This appendix contains descriptions of protocols that potentially
could be implemented to make the Apache server a multi protocol server.
The selected protocols are those that access and transfer information
resources. Such resources could be files, objects, emails, or even
cgi (dynamic content).
The following protocols are described here: HTTP, FTP, BEEP, POP,
IMAP, RTSP, SNMP.

1. HyperText Transfer Protocol (HTTP)

HTTP is a stateless protocol that is based on the request/response
paradigm. Due to the stateless nature of HTTP, each request contains
the authentication information in its header, even though multiple
requests maybe send over the same transport connection.

A client sends a request to the server in the form of a request
method, URI, and protocol version, followed by a MIME-like message
containing request modifiers, client information, and possible body
content over a connection with a server. The server responds with a
status line, including the message's protocol version and a success
or error code, followed by a MIME-like message containing server
information, entity meta-information, and possible entity-body
content.

Most HTTP communication is initiated by a user agent and consists of
a request to be applied to a resource on some origin server. In the
simplest case, this may be accomplished via a single connection
between the user agent and the origin server.


1.1. Message format

HTTP messages are according to the following BNF:
       generic-message = start-line
                         *message-header
                         CRLF
                         [ message-body ]

       start-line   = Request-Line | Status-Line

       Request-Line = Method SP Request-URI SP HTTP-Version CRLF

       Status-Line  = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

1.2. Message sequence

The protocol is represented by the following sequence. The 

The following protocol handling sequence is thought of.

    1) Accepting the connection

    2) Reading the request
                (processing; authenticating if required by user/password)

    3) Generating the response

    4) Closing the connection (or keep-alive go to 2)


2. File Transfer Protocol (FTP)

FTP is a user level protocol for file transfer between host computers. The
protocol is statefull and is based upon the request/response paradigm. FTP
uses two different connection types between a client and a server to transfer
files. The control connection is persistent during a FTP session and is used
to exchange FTP commands and associated replies.
These request involve getting directory listings, retrieving documents
or sending documents. The data connection that is managed via the FTP
commands is only available when bulk data (files or file listings) has 
to be transferred.

2.1. Message format

FTP messages:

    FTP requests consist of keyword and parameters that are 
    transferred over the control connection. 

    FTP responses consist of a status code followed by a
    human-readable message.


2.2. Message sequence

    1) Accepting the connection

    2) Sending welcome message (response type 220)

    3) Reading USER request
                (processing; keep state with user)

    4) Sending response

    5) Reading PASS request
                (processing; authenticate user/password keep state)

    6) Sending response

    7) Reading request
                (processing; as previous authenticated user)
                (a new USER request may be given that causes a user change)

    8) Sending response (if response not to QUIT request go to 7)

    9) Close connection



3. Blocks Extensible Exchange Protocol (BEEP)

BEEP is a generic application protocol for connection-oriented, asynchronous
interactions. The protocol permits simultaneous and independent exchanges
within the context of a single application user-identity. BEEP entities are
equal peers, but within a session there is a listener and an initiator. For
a BEEP connection one entity is termed client (this entity initiates a
connection) and the other entity server.

Message exchanges occur in the context of a channel which has a particular
"profile" which determines the channel's purpose, for example transport
security, user authentication, data exchange, etc. A channel is part of a
session. One session has multiple channels. A session is established by
setting up a connection, after which both peers advertise the profiles they
support. The client provides multiple profiles and the server selects one
profile.

3.1. Message format

All message (or frames) have the following format:

   frame      = data | mapping

   data       = header payload trailer

   header     = msg | rpy | err | ans | nul

   msg        = "MSG" SP common          CR LF
   rpy        = "RPY" SP common          CR LF
   ans        = "ANS" SP common SP ansno CR LF
   err        = "ERR" SP common          CR LF
   nul        = "NUL" SP common          CR LF

   common     = channel SP msgno SP more SP seqno SP size
   channel    = 0..2147483647
   msgno      = 0..2147483647
   more       = "." | "*"
   seqno      = 0..4294967295
   size       = 0..2147483647
   ansno      = 0..2147483647

   payload    = *OCTET

   trailer    = "END" CR LF

   mapping    = ;; each transport mapping may define additional frames


3.2. Message sequence

BEEP allows three styles of exchange: 
- MSG/RPY: the client sends a "MSG" message asking the server to
      perform some task, the server performs the task and replies with
      a "RPY" message (termed a positive reply).
- MSG/ERR: the client sends a "MSG" message, the server does not
      perform any task and replies with an "ERR" message (termed a
      negative reply).
- MSG/ANS: the client sends a "MSG" message, the server, during the
      course of performing some task, replies with zero or more "ANS"
      messages, and, upon completion of the task, sends a "NUL"
      message, which signifies the end of the reply.


    1) Accepting the connection

    2) Sending RPY response (seq #0)
       Reading RPY response (seq #0) (Both client and server initiate
                                      directly a welcome message)

    3) Sending request/reading response
       Reading request/sending response
                      (messages maybe exchanged continuously)

    4) Sending CLOSE request
       Reading CLOSE request
                      (messages maybe exchanged continuously)
                      (a close messge does not have a response
                       it neither has a specific request type,
                       but is in the data)

    5) Close connection


4. Post Office Protocol (POP)

POP is an application protocol that allows users to retrieve mail from
their maildrops. The protocol is supposed to be lightweight and so to
relieve end systems from heavy resource usage that would go with SMTP.

POP is a statefull protocol based upon ther request/response paradigm.
The server listens for connection initiated by a client. Upon a connection
is established the server sends a greeting message and goes into the
'authorization' state. After authorization is successfully complete the
server goes into the 'transaction' state. At termination by the client
('QUIT' command) the server goes into the 'update' state and releases it
resources that were required for the connection.


4.1. Message format

The message format used by POP is defined in RFC 822.
Message:

    POP requests (commands) are line based they are made with a keyword
    followed by arguments and terminated by a CRLF pair.

    POP responses are also line based, but allow depending on the request.
    Such a line consist of a status indicator, a keyword and optional
    information as well terminated by a CRLF pair.


4.2. Message sequence

    1) Accepting the connection

    2) Sending welcome message (positive response; goto AUTHORIZATION state)

    3) Reading USER request
                (processing; keep state with user)

    4) Sending positive response

    5) Reading PASS request
                (processing; authenticate user/password; goto TRANSACTION state)

    6) Sending positive response

    7) Reading request
                (processing; as previous authenticated user)

    8) Sending response (if response not equals QUIT then request go to 7
                         if response equals QUIT then server in UPDATE state)

    9) Close connection

NOTE: By using the AUTH command in a request the sequence USER/PASS requests
      will be replaced in order to do authentication.


5. Internet Message Access Protocol (IMAP)

IMAP is a protocol that is used for manipulating mail messages on a server.
This permits an IMAP client to invoke requests upon remote message message
folders, called "mailboxes". The protocol enables remote mailboxes to be
managed as if they were locally. IMAP also allows capabilities to work
off-line and re-synchronise with the server.

An IMAP4 session consists of the establishment of a client/server connection,
an initial greeting from the server, and client/server interactions. These
client/server interactions consist of a client command, server data, and a
server completion result response.

Interactions transmitted by client and server are in the form of lines; that
is, strings that end with a CRLF.  The protocol receiver of an IMAP4 client
or server is either reading a line, or is reading a sequence of octets with
a known count followed by a line. Interactions do not have to be
strictly one request/response after another; multiple requests may be
outstanding at a time and responses do not have to be returned in order.

5.1. Message format


The message format used by IMAP is defined RFC 822.


5.2. Message sequence

    1) Accepting the connection

    2) Sending welcome message (OK greeting; non-authenticated state)

    3) Reading LOGIN or AUTHENTICATE request
                (processing; authenticated state)

    4) Sending response

    5) Reading request
                (processing)

    6) Sending response (if response not equal to LOGOUT request go to 5)

    7) Close connection

NOTE: If a pre authenticated connection is used step 2 will directly
      send a pre authenticated greeting and go in to the authenticated
      state. (Step 3 and 4 will be skipped)



6. Real-Time Server Protocol (RTSP)

RTSP is an application level protocol that uses an HTTP message format to
exchange information. It has still most components of the HTTP protocol
with the following exceptions:
    - both client and server can issue requests
    - data is usually carried out of band
    - server needs to maintain state
    - a session is not tied to a connection, they can be resumed.

6.1. Message format

RTSP messages are according to the following BNF:
       generic-message = start-line
                         *message-header
                         CRLF
                         [ message-body ]

       start-line   = Request-Line | Status-Line

       Request-Line = Method SP Request-URI SP RTSP-Version CRLF

       Status-Line  = RTSP-Version SP Status-Code SP Reason-Phrase CRLF

6.2. Message sequence


7. Simple Network Management Protocol (SNMP)

SNMP is an application protocol used to convey management information.
The protocol is asymmetric, that is an SNMP manager can query an SNMP agent
in a request/response paradigm. In addition an SNMP agent may send
notifications to an SNMP manager if it detects events of special interest.

Currently, there are 3 different version of SNMP. Version 1 and version
2 Community-Based using a community based authentication which is a single
string. Version 3 has a stronger authentication scheme and has a user
based approach.

The information that is accessed (in comparison with previous protocols)
is not directed towards files, but instead to managed objects that are
ordered in a lexical graphical ordering based on Object Identifiers.

7.1. The message format

The SNMP message format is a Basic Encoding Rules (BER) encoded. This is
an encoding scheme used in combination with the Abstract Syntax Notation
Number 1 (ASN.1). It may be considered as a binary format.

7.2. Message sequence

    1) Reading the request
                (processing; authenticating)

    2) Generating the response

NOTE: SNMP uses normally the UDP transport domain which does not
      have a connection. However, similar information as with a
      connection must be maintained during processing, such as
      local transport endpoint and remote transport endpoint.

NOTE: The correct operations of an SNMP protocol module over a serverfarm
will have huge implications. First, within a single Apache instance it
can be very well possible that SNMP requests are handled by various children.
It may even process concurrent SNMP reuqsts which causes SNMP-SET requests
to be vary complex. Locking mechanisms over all children are required to make
the SNMP-SET atomic. Secondly, the problem is when a load-balancer is used.
An SNMP manager that needs to go via the load-balancer is not able to know
which system it retrieves the responses from. This causes problem by performing
an snmp walk over the MIB tree or the manager does not really know which CPU
load it retrieved.


-------------------------------------------------------------------------------
Appendix B:  Current Hooks

These are basically book-keeping hooks.  They are used if the module needs to
do some setup before actually running the server.

pre_config              -- run just before reading the config file
post_config             -- run just after reading the config file
open_logs               -- used to open any specified logs
optional_fn_retrieve    -- used to retrieve any functions registered as optional
child_init              -- call as soon as the child is started

These are the key hooks for protocol modules.  By definition, a protocol 
module implements the process_connection hook.

pre_connection          -- do any setup required just before processing, but
                           after accepting
process_connection      -- Run the correct protocol

These hooks are used while actually processing a request

post_read_request       -- called after reading the request, before any other 
                           phase
translate_name          -- translate the URI into a filename
header_parser           -- let's modules look at the headers, not used by most 
                           modules, because they use post_read_request for this.
access_checker          -- check for module-specific restrictions
check_user_id           -- check the user-id and password
auth_checker            -- check if the resource requires authorization
type_checker            -- determine and/or set the doc type
fixups                  -- last chance to modify things before generating 
                           content
handler                 -- Generate the content
log_transaction         -- log information about the request

Assorted hooks that don't really fit anyplace.

http_method             -- retrieve the http method from a request.  (legacy)
default_port            -- retrieve the default port for the server
get_suexec_identity     -- Get the user to run a CGI request as.  (Required so 
                           that CGI works for multiple protocols)
insert_filter           -- An opportunity for modules to insert filters into
                           the filter chain.

Each module may also implement their own hooks and optional functions, as
mod_cache and mod_include do respectively. The concept of additional hooks
and optional function would allow an abitrary module use a perticular function
if the function providing module would be there. That way, for instance, a
protocol module can add extra hooks that may be used by for instance a cgi
module.

Apache 2.0 for multi protocol usage

Reply via email to