sorry it's a bit long...

Santiago Gala wrote:
>
[snip]
>
> - Default thread count be lowered to 5, with a maximum of 10, and 2 free
> threads.

wouldn't this have an impact on other classes that (will) make use
of the ThreadManager ?

I quickly checked the source tree and it appears that - at the moment -
ThreadManager is only used by the URLFetcher, however this could
change in the future. If we want to take care of this, we could use a
separate threadpool for the download or remote files.

Better, why not instanciate multiple ThreadManager, one for each
domain, with a singleton key based on the domain name of the
downloads it will handle... ?

ThreadManager.getInstance( url ) would extract the domain name
from the url and return the appropriate singleton. This singleton would
have a moderate number of threads (2 - 5 ?) available. Of course
this doesn't prevent jetspeed from flooding itself by downloading
from many different domains (this also has to be addressed).

> - We should play with the refresh rate of the DiskCacheDaemon. Possibly
> 1 hour is too fast.

I know that some formats embeds an expiry time, we could use it
when available, and simply let the user configure the default delay for
the other (for now).

It would be nice if sites providing a large number of feeds (moreover for
example) could provide a list of documents updated in the last hour,
the last 4 hour, etc... We would just have to download this list
periodically
and refetch the updated documents since our last update. Of course this
isn't under our responsibility, and must be done by the content provider.
Did any of you heard about such an initiative ?

> A longer term proposal would be to implement a maximum number of
> concurrent requests PER SERVER. A way to achieve that would be to use
> the W3C libwww protocol implementation, a drop in HTTP package. I wil
> test it.

http://www.w3.org/Library/  ?

If I'm right, this is a native library, how would you interface it with
jetspeed ?

> BTW: Has anybody experienced such a problem? I got it in a machine that
> is deployed in a fast corporate network. With my modem, I only got reset
> connections from 10.am, but never overflowed network54.com

yes... on a 256k connection, jetspeed acts as a very effective DoS tool :-)
on our MRTG graphs, we could clearly see the updates every hour...

> What do you think about that?

well... something has to be done :-)

For the download/refresh scheduling, we could use turbine's scheduling
system (in which version is it available ?) and write some glue to submit
refresh jobs to a proper "download job queuing system" that would solve
the "parallel download tornados" problem. You submit your download
request and it is placed in an appropriate queue, where a variable number
of threads picks them according to a set of rules (max total downloads,
max per-site download, maybe max bandwidth, ...). We should make it
possible to synchronously wait for job completion.

Is there any library we can use that already does this ?

I'm willing to contribute code/ideas in this area... and I'll be able to
work
on this after apachecon (read early november), if nobody's completing
this before :-)

  alex




--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://java.apache.org/main/mail.html>
Problems?:           [EMAIL PROTECTED]

Reply via email to