[Jakarta-httpclient Wiki] Trivial Update of "HttpAsyncThreadingDesign" by RolandWeber

Apache Wiki Sun, 27 Jan 2008 09:57:11 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jakarta-httpclient 
Wiki" for change notification.


The following page has been changed by RolandWeber:
http://wiki.apache.org/jakarta-httpclient/HttpAsyncThreadingDesign

The comment on the change is:
page moved

------------------------------------------------------------------------------
- #pragma section-numbers 2
+ #DEPRECATED
  
- = Threads and Synchronization in HttpDispatch =
+ This page has been 
[http://wiki.apache.org/HttpComponents/HttpDispatchThreadingDesign moved]
+ to the new [http://wiki.apache.org/HttpComponents/ HttpComponents Wiki].
  
- == About ==
+ ##
  
- The purpose of this document is to provide a design documentation for the use 
of
- threads and synchronization in !HttpDispatch
- that is separate from the source code. Unlike the source code, this design 
document
- will not only reflect the current implementation, but also lists design 
alternatives
- and gives a rationale for design decisions. And there are pictures here!
- [[BR]]
- Note that !HttpDispatch is the working title for what was formerly referred 
to as
- [http://jakarta.apache.org/httpcomponents/http-async/index.html HttpAsync].
- There are some leftover references to the old name on this page, in 
particular the page name and labels in the pictures.
- 
- ''Work on !HttpDispatch is currently suspended.''
- The code mentioned below is archived
- 
[http://svn.apache.org/repos/asf/jakarta/httpcomponents/httpasync/branches/suspended-at-HttpCoreAlpha4/
 here].
- It compiles against !HttpCore alpha 4.
- A lot of progress has been made in !HttpCore and !HttpConn since it was 
originally developed.
- The code is therefore outdated, but can still serve as a starting point to 
pick up development.
- If you feel like spending time on !HttpDispatch, just send a mail to the 
developer list.
- 
- ----
- [[TableOfContents]]
- ----
- 
- 
- == Background ==
- 
- The purpose of the !HttpDispatch component or module
- is to provide an API that allows applications to execute HTTP requests 
asynchronously. That means the
- application creates a request, hands the request over to !HttpDispatch, and 
later picks up the response.
- Typically, applications also want to be notified when a response becomes 
available.
- There is a selection of UseCases that address asynchronous communication.
- [[BR]]
- There will always be at least two threads required, one on the application 
side
- and one background thread on the !HttpDispatch side. On this high level of 
abstraction,
- it doesn't matter whether there are one or many threads on either side. There 
may
- also be several applications using !HttpDispatch at the same time, or several 
components
- of one large application.
- [[BR]]
- Executing a request involves several steps. Each step needs to be executed by 
either an
- application thread or a background thread (from !HttpDispatch). As part of 
the design, it is
- necessary to define which step should be executed by which kind of thread. 
Although it is
- possible to defer such decision to runtime, threading issues will be easier 
to handle if
- the assignment is static.
- The following figure shows the steps required to execute a request.
- 
- attachment:responsibilities.png
- [[BR]]
- 
- Steps that necessarily have to be executed by an application thread are shown 
to the left.
- Only the application can decide which request should be executed and what to 
do with the
- response.
- To the right are steps that have to be executed by a background thread.
- Sending of the request and waiting for the response is there since it is the 
purpose
- of !HttpDispatch to offload such tasks from applications. Notification for 
incoming responses
- has to be triggered by the thread that was waiting for the response.
- Receiving the response header is assigned to the background thread too, 
because
- it is a precondition for notification, as explained below.
- The steps in the middle column can reasonably be assigned to either side.
- 
- Assigning the steps to application threads or background threads is one thing.
- Another question is the responsibility for the code that gets executed.
- Some of the steps in no man's land are implemented by application code,
- indicated by the red backdrop.
- While the code for the pre- and postprocessing is not necessarily written
- by the application developer, it is the application that decides which
- interceptors will be executed in these steps. Interceptors are also a
- plugin point for application code, therefore the responsibility for what
- is done in these two steps is with the application.
- It is arguable whether "send request" should be considered application code,
- since it can involve a request entity provided by the application developer.
- In HttpClient, the request entities included with the package were usually
- sufficient, so this step is not marked as executing application code here.
- 
- The order of the steps from top to bottom is roughly chronological,
- but some are independent and can be executed in a different order.
- For example, a request must be created before it can be preprocessed.
- But the connection for sending the request can be allocated before
- or after preprocessing, or even before the request is created.
- The table below shows the sequences in which some of the steps have
- to be executed, one sequence in each column.
- Postprocessing has to be done before chasing redirects, since there might
- be cookies in the response that need to be stored for the followup request.
- Reading the response header should be done before notification, because a
- notification before status code and headers of the response are known would
- be very inconvenient to use. The other sequences are obvious.
- 
- ||<^> create request[[BR]] preprocess[[BR]] send request[[BR]] receive 
response header[[BR]] postprocess[[BR]] interpret final response[[BR]] ||<^> 
allocate connection[[BR]] send request[[BR]] receive response header[[BR]] read 
response body[[BR]] consume response[[BR]] release connection[[BR]] ||<^> 
receive response header[[BR]] notify[[BR]] handle notification[[BR]] ||<^> 
receive response header[[BR]] postprocess[[BR]] chase redirects[[BR]] ||
- 
- 
- == API ==
- 
- The application programming interface (API) for HttpDispatch in package 
{{{org.apache.http.async}}}
- defines three interfaces. The following figure shows their place with respect 
to the steps that
- have to be executed.
- 
- attachment:interfaces.png
- 
- Two of the interfaces are application-facing. {{{HttpDispatcher}}} is used to 
transfer control
- over a request to HttpDispatch. Since this is done by a call from an 
application thread, the
- implementation can then execute code in that application thread. Eventually, 
the request has
- to be passed to the background threads that handle the asynchronous 
communication. The application
- obtains an instance of the second interface as a result of the call to 
{{{HttpDispatcher}}}.
- [[BR]]
- Instances of {{{HttpHandle}}} are specific to a request. When the application 
tries to access
- the response to a specific request, it does so through the {{{HttpHandle}}} 
for that request.
- When the application is done with processing a response to a specific 
request, it indicates
- that to the {{{HttpHandle}}} for that request. If the application has to 
cancel a specific request,
- it does so through the {{{HttpHandle}}} for that request. Again, the 
implementation has the
- opportunity to execute some of the steps in the calling application thread.
- Thread synchronization is a particular issue here, since several application 
threads may be
- calling the same instance of {{{HttpHandle}}} concurrently.
- [[BR]]
- The third interface {{{HttpNotificationHandler}}} is used by background 
threads
- to notify applications of incoming responses, or of problems encountered 
while executing a
- request. It would have been possible to define notifications in terms of 
specific objects for
- thread synchronization. While background threads would not have had to 
execute application code
- for notification in that case, the flexibility for application developers 
would have been
- signifcantly reduced. Instead, a background thread is calling directly into 
application code,
- which can then use suitable means to relay the notification to application 
threads. The thread
- calling into application code is symbolized by the cyan border around the red 
box for
- "handle notification".
- Implementing the {{{HttpNotificationHandler}}} interface requires '''special 
care'''
- by application developers, since a misbehaving notification handler can
- take down background threads and thereby stall other requests as well.
- 
- The step "chase redirect" is shown in brackets since it is not yet part of 
the API.
- If it becomes part of the API, it will probably not be in the 
{{{HttpHandle}}} interface,
- although it's position in the figure might trick you into expecting that. 
There are
- too many problems to be solved first, so let's not worry about chasing 
redirects now.
- 
- 
- === Synchronization Details ===
- 
- {{{HttpDispatcher}}} has a method {{{sendRequest}}} to transfer control
- of a request and obtain a handle. {{{abortAll}}} can be used to cancel all
- request (handles) currently controlled by the dispatcher, but it leaves
- the dispatcher operational.
- {{{shutdown}}} (''not yet implemented'') will cancel all requests and
- stop operation of the dispatcher. It releases resources such as
- background threads. Dispatcher implementations may have methods
- that allow reinitialization, but that is not part of the interface.
- 
- {{{HttpHandle}}} has a method {{{awaitResponse}}} which will block
- the calling process until the response is available or until an error
- is encountered. By using notifications, the caller can make sure that
- it will be blocked only momentarily, if at all.
- [[BR]]
- {{{close}}} indicates that processing of the response has finished
- and that the connection over which the response is being received
- can be used for another request. When the handle is closed while
- the response has not been read completely, the rest of the response
- may be consumed.
- [[BR]]
- {{{abort}}} can be called at any time to abort processing of the
- request. If the request is not yet sent, it will be removed from
- the relevant queue gracefully. If it is sent but the response not
- yet received, the response will be discarded. Aborting a handle
- never consumes the rest of the response, but it has a negative effect
- on keep-alive and pipelining. After being aborted, the handle behaves
- as if an error was encountered.
- [[BR]]
- {{{isLinked}}} indicates whether the handle is still linked to the
- dispatcher and it's connection. Closing or aborting the handle will
- unlink it. Note that access to {{{isLinked}}} can not be synchronized:
- even if it returns true, you can't be sure that the handle is still
- linked by the time you call another method. Once a handle is unlinked,
- it remains unlinked.
- 
- {{{HttpNotificationHandler}}} has methods {{{notifyResponse}}} and 
{{{notifyProblem}}},
- which are called for incoming responses and encountered problems, 
respectively.
- There will be at most one notification for either the response or a fatal 
problem.
- If {{{notifyResponse}}} is called but throws a runtime exception, that is a 
fatal problem.
- But there will be no problem notification, since the response notification 
has already
- been given. On the application side, the handle will behave as if an error 
was encountered.
- [[BR]]
- There can be several notifications about non-fatal problems before
- the final notification, but not afterwards. Imagine a server that
- receives the request header, sends an error response immediately,
- and closes the connection while the dispatcher still tries to send the
- request body. This triggers an exception on sending, but the response
- from the server is available. {{{notifyProblem}}} may be called
- for a non-fatal problem then. It's return value indicates whether
- the problem should be handled as a fatal one, or whether processing
- should resume and another notification given.
- [[BR]]
- Notifications are triggered exclusively by operations of the background 
threads.
- Aborting a request at any time does ''not'' trigger a notification, even 
though
- the handle will behave as if an error was encountered.
- 
- All methods in {{{HttpDispatcher}}} and {{{HttpHandle}}} are thread safe.
- All methods in {{{HttpNotificationHandler}}} must be thread safe.
- They also must return quickly to keep the background threads available
- for tasks related to other requests. In particular, none of the blocking or
- time-consuming methods of {{{HttpHandle}}} must be called during a 
notification.
- {{{HttpHandle.abort}}} is OK to be called. Some implementations may also allow
- {{{HttpHandle.close}}} to be called, but that is not guaranteed by the API.
- 
- 
- === Application Considerations ===
- 
- Applications using !HttpDispatch have one very important responsibility which 
was
- not been mentioned so far. It may sound trivial, but really it isn't:
- 
-  Applications '''must''' process responses as they arrive.
- 
- Due to the asynchronous nature of !HttpDispatch, an application can generate 
several
- requests and pass them to a dispatcher. !HttpDispatch does ''not'' guarantee 
that these
- requests will be sent in order. Responses may arrive in any order (even 
different
- from the order in which requests are sent), and each response with an entity 
locks up
- one connection until it is processed.
- [[BR]]
- Theoretically, notification is optional. An application thread can block on 
the
- handle for a request until that specific response arrives. But since the order
- in which requests are sent is not guaranteed, it can happen that other 
responses
- which are not processed by the application lock up all connections, and that 
the
- one request on which the application waits will never be sent. Even if this
- deadlock scenario does not occur, blocked connections will degrade 
performance.
- [[BR]]
- Probability theory tells us that what can happen will happen eventually.
- Murphy's Law tells us that what can go wrong will go wrong, in the worst 
possible moment.
- Therefore, applications that generate more than one request per thread at a 
time
- '''must''' use notification in order to process responses on arrival.
- 
- 
- == Blocking IO Implementation ==
- 
- This section presents design alternatives for implementing the !HttpDispatch 
interfaces.
- An implementation is also referred to as a ''dispatcher'', since each 
implementation
- of {{{HttpDispatcher}}} requires a matching implementation of 
{{{HttpHandler}}} and
- will make use of {{{HttpNotificationHandler}}}, which is implemented by 
applications.
- [[BR]]
- In the figures below, fat lines indicate threads running from top to bottom.
- This is not necessarily one thread on either side. The fat red line to the 
left
- stands for all application threads, while the fat cyan line to the right 
stands
- for all background threads.
- Objects for thread synchronization are represented by a queue-like symbol. 
Thinner lines
- in the respective color connect the synchronization objects to the thread 
lines.
- Big queue objects are used for passing handles, small queue objects for 
synchronizing
- on a specific handle.
- 
- There are two big queue symbols in each design alternative. One is used to 
pass the
- handles for newly created objects from the application side to the background 
threads.
- That object is under control of the dispatcher.
- The second one is used to pass handles from the notification handler to the 
application side.
- That happens under control of the application, indicated by the red backdrop 
of the symbol.
- Applications can use any number of actual objects there, for example to route 
handles to
- different application threads.
- [[BR]]
- There are two small queue symbols in each design alternative. One is used to 
pass the
- response (or error) from the background threads to the application threads. 
The other
- is used to indicate completion of response processing to the background 
threads, which
- can then release or re-use the connection that was locked up by that 
response. Both of
- these synchronization objects are under control of the dispatcher.
- 
- 
- === Red Design ===
- 
- This extreme design is based on the following premises:
-  * Background threads are a shared resource that should be used only for what 
is absolutely necessary.
-  * Application code is unstable and should be executed by application threads 
whenever possible.
- 
- attachment:reddesign.png
- 
- Preprocessing and postprocessing is done by application threads because these 
steps
- execute application code. Consuming the response is also done by an 
application thread,
- because it is a potentially long-running task that does not necessarily have 
to be executed
- by a background thread.
- [[BR]]
- With this design, notification handling does not have access to the 
postprocessed response.
- The notification handler can not close the handle either.
- Errors in preprocessing will not generate load in the background threads.
- The code for pre- and postprocessing can use blocking operations, including 
user interaction.
- Only an application thread will be blocked, but the dispatcher continues 
operation.
- 
- 
- === Cyan Design ===
- 
- This extreme design is based on the following premise:
-  * If it can be done by a background thread, let it be done by a background 
thread.
- 
- attachment:cyandesign.png
- 
- Preprocessing and postprocessing are done by background threads, as is 
consuming the response.
- Postprocessing is done before notification, since that is the last chance to 
detect and report
- a problem in a background thread.
- [[BR]]
- The notification handler has access to the postprocessed response, and it can 
close the handle.
- Errors in preprocessing will trigger a problem notification.
- Pre- and postprocessing are subject to the same restrictions as notification 
handling.
- In particular, they can not use long-running blocking operations, since they 
would block a
- background thread and thereby interfere with processing of other requests and 
responses.
- 
- 
- === Consolidated Design ===
- 
- After discussion on the developer mailing list, the following design choices 
have been made for the initial implementation.
- They are subject to review, discussion, and change.
- 
-  1. Preprocessing can be switched between application thread and background 
thread through a parameter.[[BR]] The default is to preprocess in the 
application thread, since that keeps bad requests that fail to preprocess out 
of the dispatcher.
-  1. Postprocessing can be switched between application thread and background 
thread through a parameter.[[BR]] The default is to postprocess in the 
background thread, since it is unpredictable which of several application 
threads would be the one that does the postprocessing.
-  1. Consuming of the remaining response body is done in the background 
thread, since that step is logically tied to connection management.[[BR]] 
Applications that don't want the background thread to consume the response body 
can consume it explicitly before closing the handle.
- 
- 
- 
- 
- == Non-blocking IO Implementation ==
- 
- The blocking IO implementation promises maximum performance. It's major 
drawback is that it requires at least as many background threads as there are 
connections, since a dedicated thread needs to wait for incoming responses on 
each connection.
- That may be acceptable in client applications, for example a web spider. For 
server side applications like proxies, this resource inefficiency is typically 
not acceptable.
- [[BR]]
- Non-blocking IO allows a single thread to wait for an incoming message on 
''any'' connection. Although it is possible to switch sockets between blocking 
and non-blocking modes, this can not be used to mix non-blocking IO for waiting 
with blocking IO for receiving. The socket behavior can only be specified for 
both directions, sending and receiving.
- When pipelining, the socket can be used for sending requests at any time, the 
operation mode must therefore not be changed.
- An extra mixed-mode dispatcher that excludes pipelining hardly seems worth 
the effort.
- 
- 
- ''This is the place for discussing {{{java.nio}}} based dispatchers.''
- 
- The foundation for implementing HTTP communication with NIO is already 
available in
- 
[http://jakarta.apache.org/httpcomponents/httpcore/jakarta-httpcore-nio/index.html
 HttpCore-NIO].
- 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[Jakarta-httpclient Wiki] Trivial Update of "HttpAsyncThreadingDesign" by RolandWeber

Reply via email to