For some backgrounders, check out the PDFs from this site: http://www.cs.berkeley.edu/~mdw/proj/seda/
Seda is a Staged Event-Driven Architecture designed for efficient networking. Using this architecture and a native library to interface with UNIX non-blocking IO, the SEDA team was able to create a web server that scaled better than Apache. In fact, this *Java* implemented server (Haboob by name) was able to sustain a load of over 10,000 concurrent users in one VM. I originally started looking into this system while playing with a Non-Blocking IO implementation for Avalon Phoenix. The papers do raise some very interesting points, that do change the way we think about a problem space. Before you completely dismiss this RT because Cocoon does not control the socket mechanisms, there are some architectural details that we can adopt. For instance, the difference between SEDA and FSM approaches (which can be similar) is the difference in ease of programming. A Finite State Machine (FSM) is composed of a group of states and transitions between states. As the number of states grow, the number of *possible* transitions grows exponentially. This does not mean that all transitions are supported in a system, but it does mean that you have to explicitly declare the transitions you do support. This is key. SEDA is composed of Stages in an overall pipeline--with stages able to be sequenced or executed in parallel. Each stage takes care of only one aspect of the connection and read/write process. They implemented the Haboob web server (only minimal support for dynamic pages) in a little over 3,000 Source Lines of Code (SLOC, although they use another acronym than means the same thing--Non-Commenting Source Statements [NCSS]), of which ~600 were devoted to the HTTP protocol. The overall pipeline was something like this: +---+ | | SR CM-FIO \ m/ +-HP-PC-+ / h\ SL HS-WP SR: Socket Read SL: Socket Listen HP: HTTP Parse PC: Page Cache CM: Cache Miss FIO: File I/O HS: HTTP Send WP: Write Packet m: Cache Miss h: Cache Hit The stages you see that are parallel are run concurrently. Each Stage handles just the portion of the code that is necessary to it. Also, each stage has an event queue, a handler, a controller, and a thread pool that is dynamically sized by the SEDA system. While Cocoon is forced to assume it is running in one thread per request, we can take leverage that knowledge to remove the need for the event queue and thread pool. What this does allow is a different approach to thinking about the problem space. For most web applications, the FSM approach is overkill. What the designer is concerned with is the organization of information, and the flow of information. I don't know if it is still available, but "Web Style Guide" by Patrick J. Lynch and Sarah Horton, published by Yale University Press (http://www.yale.edu/yup) has some good general guidelines for organizing a web page. Cocoon has to be able to support each of these approaches. They have a grid that helps prioritize how to organize information. By knowing your audience, it helps you organize your site. You find the relative intersection of the length of time a user has in contact with the site and how linear the information needs to be. They categorize users into 5 categories: Teaching, Self-Education, Browsing, Training, and Reference. General browsing is in the center of the graph. However, if your site is aimed at Teaching, your users are going to be at your site for a while, and the information needs to be arranged linearly. The book favors nice links to go to the next or previous page in the sequence. Training is almost the same as Teaching, except the audience attention span is shorter. Self-Education and Reference favor more ad-hoc hyperlinked hierarchies, with a random access approach to information (Reference favors shorter attention spans). In the end what this means is that Flow management is only required for some audiences. For instance, a slide presentation requires the Flow to be managed in a linear fashion. However reference sites and portals favor the ad hoc approach that the current Sitemap enables. What does this mean for the lay person? One size does *not* fit all. Sitemaps and Flowmaps have different purposes. Now that we are beyond simple publishing, we have to concider how to choose between a Flow map or a Sitemap. Again considering the wide range of needs for sites, we have to worry about how our particular site needs to work. To adequately describe this, we have to use a case study. Since I am most familiar with approval systems, we will use a news approval and publishing system for a case study. NEWS APPROVAL SYSTEM: --------------------- The news portal site needs are fairly simple. It needs to allow people to read the published news without logging in. If the news site is like Slashdot, the anonymous user also does not need to log in to view comments on the news story. However, once the user wants to submit a story or write a comment, the server needs to manage the user's session. That being said, some sites will have the policy that all users need to be logged in at this stage, while others will allow "anonymous" posters. Finally, in order to *review* a pending news item, the reviewer must be both logged in, and have the proper authority. In this use case the set of pages or links a user can access are additive. That is, an unmanaged user can only access a small portion of the site, while a known user has more pages he can access (like managing his account, or posting articles). A reviewer can go farther and approve news articles for publishing. Let's say that a reviewer cannot review his own posts (reasonable in other systems esp.). In this system, we have a combination of linear processes and random access processes. As an example, any user can read the news. This is a constant in the system. Therefore, our root in this system is the Sitemap. It allows random access to all news articles, and *mounts* the FlowMaps for the different processes to certain URLs. By renaming each aspect of this site into distinct stages, you have a list of stages like this: Default(D): The default stage when a user accesses the site (view news) Session Managed(SM): The stage when a user wants to write comments--can be merged with the next stage if a site requires it Logged In(LI): The stage where the user is authenticated News Submission(NS): The stage where the user submits a news article for publishing News Review(NR): The stage where the reviewer has opened the article and is looking at it News Approval(NA): The stage where the article is approved News Rejection(RJ): The stage where the article is rejected Rejection Notice(RN): The stage where the reviewer selects a canned reason for the rejection and adds any aditional comments Inbox(I): The stage where the reviewer views the list of pending articles Comment submission (CS): The stage where the user writes a comment Comment view(CV): The stage where the user views the comments for a news article In the system, they would be arranged like this: D---D----+----D--D CV<+ | | / | | | SM+ | |nav. | \ | | | CS>+ | | | |news NS>+-+ | / |news | LI+ + NA-+->+ \ / a/ | nav. I----NR-+ | | r\ | | RJ-RN>+ +-<-------------+ a=accept, r=reject Notice the parallelism in the stages, at each juncture, the Default stage is available at the same time as the other stages. Also notice that you only have other stages available when you are Session Managed, or Logged In. This also has other bearings on the system. For instance, you now have some new URLs available to you now that the session is managed, or that you are logged in. This requires a master control that maps the different sitemaps to base urls. This in effect reduces the size of each sitemap and eases the tree like implementation of a natural heirarchy--with a sitemap based on user privelege, it can make certain aspects of designing a complex simple easier. However notice the approval section. Notice the two different flows: one for the news article and one for navigation. These are two different views of the same problem space super-imposed. The FlowMap only needs to represent the Navigational Stages. The controller at each stage will decide the path to take if there are multiple paths. For instance, in the review flow, you have to decide whether to accept or reject. If you accept it, the flow map controller will send you to the next navigational stage (NA). Ok, I think I have to get back to work now :( so party on people, and fire away at it. Remember the distinction between States and Stages is that a Stage has a controller associated with it (next stage determination). Also the Stages are kind of like a navigational pipeline as opposed to a state-managed heirarchy. -- "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]