On Thu, Mar 6, 2008 at 7:14 AM, Jon Blower <[EMAIL PROTECTED]> wrote:
[...]
>  We have an existing RESTful web application that involves clients
>  downloading multiple streams of data simultaneously.  Our current
>  implementation is based on servlets and we are experiencing
>  scalability problems with the number of threads involved in serving
>  multiple large data streams simultaneously.  I recently came across
>  Restlet and was attracted by the potential to use NIO under the hood
>  to enable more scalable large file transfers.

Cool.

>  In our case we are not necessarily serving large files that already
>  exist on disk: we are essentially creating the files ourselves on the
>  fly (so they are of unknown length when the file transfer starts).  I
>  was wondering if anyone could offer advice on how to support the
>  serving of such data streams through Restlet in a scalable manner
>  (ideally without creating a new thread on the server for each file
>  transfer)?

What do you mean by "large files"?  I.e., are talking about generating
content that is merely large relative to a web page (i.e., measured in
megabytes) or are you talking about something like complete hi-def
video (GBs in size) or something both large and nominally endless like
live video streams?

For the first case, if they are small enough I'd start by just fully
rendering the contents to a Representation as usual and profile how
well you can use the existing Jetty connector (with tuning, etc.).  As
you add more simultaneous clients, add more servers.  Also, run your
experiments with the new Grizzly connector and track that as it and
v1.1+ stabilizes.

For the second case (or where you have content sizes in the first case
but lots of slow clients), I'd actually have that part of my origin
servers either be fronted by a reverse-caching-proxy (e.g., squid) or
generate and dump the contents from the origin server into a local
file and redirect the client to get that content from e.g., lighttpd
(+mod_secdownload).  Depending on the nature of your client
applications, the potential reuse of the generated content, etc. you
can tune how you clean up the caches.

For the last case, if I controlled the clients then I'd probably have
the clients request good-sized chunks of the data in a loop and
devolve to the appropriate combination of the first two approaches. Of
course, that's more or less presuming that you can generate those
chunks more or less independently (i.e., with minimal state
information needed to keep the continuity from chunk to chunk).  If
you have heavy amounts of state and/or if you don't control the
clients then I'd want to know a good bit more before making any
recommendation.

Hope this helps,
John

Reply via email to