Re: [fossil-users] Fossil server load control

2014-03-12 Thread Andreas Kupries
On Wed, Mar 12, 2014 at 6:40 AM, Richard Hipp d...@sqlite.org wrote:
 A new feature was recently added to Fossil that allows it to deny expensive
 requests (such as blame or tarball on a large repository) if the server
 load average is too high.  See
 http://www.fossil-scm.org/fossil/doc/tip/www/server.wiki#loadmgmt for
 further information.

Interesting.

 I am pleased to announce that this new feature has passed its first test.

 About three hours ago, a single user in Beijing began downloading multiple
 copies of the same System.Data.SQLite tarball.  As of this writing, he has
 so far attempted to download that one tarball 11,784 times (at last count -

 a rate of about one per second, and each request takes about 3.1 seconds of
 CPU time in order to compute the 80MB tarball.

 And if you have alternative suggestions about how to keep a light-weight
 host running smoothly under a massive Fossil request load, please post
 follow-up comments.

How sensible do you think would it be to have a (limited-size)
(in-memory|disk) cache to hold the most recently requested tarballs ?
That way a high-demand tarball, etc. would be computed only once and
then served statically from the cache.

Note that I actually see this as a possible complement to the load mgmt feature.
The cache would help if demand is high for a small number of
revisions, whereas load mgmt would kick in and restrict load if the
access pattern of revisions is sufficiently random/spread out to
negate the cache (i.e. cause it to thrash).

Side note: While the same benefits could be had by putting a regular
web cache in front of the fossil server, i.e. a squid or the like this
would require more work to set up and admin. And might be a problem
for the truly dynamic parts of the fossil web ui. An integrated cache
just for the assets which are expensive to compute and yet
(essentially) static does not have these issues.

I mentioned in-memory and disk ... I can see that a two-level scheme
here ... A smaller in-memory cache for the really high-demand pieces
with LRU, and a larger disk cache for the things not so much in-demand
at the moment, but possibly in the future. The disk cache could
actually be much larger (disks are large and cheap these days), this
would help with random access attacks (as they would become
asymptotically more difficult as the disk cache over time extends its
net of quickly served assets).



-- 
Andreas Kupries
Senior Tcl Developer
Code to Cloud: Smarter, Safer, Faster(tm)
F: 778.786.1133
andre...@activestate.com
http://www.activestate.com
Learn about Stackato for Private PaaS: http://www.activestate.com/stackato

EuroTcl'2014, July 12-13, Munich, GER
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil server load control

2014-03-12 Thread Richard Hipp
On Wed, Mar 12, 2014 at 1:13 PM, Andreas Kupries
andre...@activestate.comwrote:

 On Wed, Mar 12, 2014 at 6:40 AM, Richard Hipp d...@sqlite.org wrote:

  And if you have alternative suggestions about how to keep a light-weight
  host running smoothly under a massive Fossil request load, please post
  follow-up comments.

 How sensible do you think would it be to have a (limited-size)
 (in-memory|disk) cache to hold the most recently requested tarballs ?
 That way a high-demand tarball, etc. would be computed only once and
 then served statically from the cache.


It's on my to-do list, actually.  The idea is to have a separate database
that holds the cache.  And yes it is complementary to the load management
feature.



 Side note: While the same benefits could be had by putting a regular
 web cache in front of the fossil server, 


No they can't actually, at least not by any technology I'm aware of.  The
problem is that these request must be authenticated.  Downloads might be
only authorized for certain users.  If an authorized user does a download,
and squid caches it, some other unauthorized user might be able to obtain
the download from cache.

Even if downloads are currently authorized for anybody (which is the common
case, at least on public repos), I don't think you want them being cached,
since to do so would mean that turning off public downloads would be
ineffective until the caches all expired.

I mentioned in-memory and disk ... I can see that a two-level scheme
 here ... A smaller in-memory cache for the really high-demand pieces
 with LRU, and a larger disk cache for the things not so much in-demand
 at the moment, but possibly in the future. The disk cache could
 actually be much larger (disks are large and cheap these days), this
 would help with random access attacks (as they would become
 asymptotically more difficult as the disk cache over time extends its
 net of quickly served assets).


The current Fossil implementation runs a separate process for each HTTP
request.  So an in-memory cache wouldn't be helpful.  It has to be
disk-based.

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil server load control

2014-03-12 Thread Stephan Beal
On Wed, Mar 12, 2014 at 6:13 PM, Andreas Kupries
andre...@activestate.comwrote:

 How sensible do you think would it be to have a (limited-size)
 (in-memory|disk) cache to hold the most recently requested tarballs ?
 That way a high-demand tarball, etc. would be computed only once and
 then served statically from the cache.


FWIW: i was scratching down ideas for this very idea today for the
libfossil CGI demos because i don't like the memory cost of generate ZIP
files from script code. Caching the (say) 10 most recent ZIPs could
alleviate some of my load concerns. It need not be a synchable table, nor
in one which survives a rebuild.

Note that I actually see this as a possible complement to the load mgmt
 feature.
 The cache would help if demand is high for a small number of
 revisions, whereas load mgmt would kick in and restrict load if the
 access pattern of revisions is sufficiently random/spread out to
 negate the cache (i.e. cause it to thrash).


+1


 would require more work to set up and admin. And might be a problem
 for the truly dynamic parts of the fossil web ui. An integrated cache
 just for the assets which are expensive to compute and yet
 (essentially) static does not have these issues.


In my experience, most proxies won't cache for requests which have URL
parameters. Whether or not that's generally true, i can't say. For static
content (lots of what fossil serves is static), the URLs can/should be
written as /path/arg1/arg2, rather than /path?arg1=...arg2=..., to make
them potentially more cacheable.


-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do. -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil server load control

2014-03-12 Thread Ramon Ribó
 ​
The current Fossil implementation runs a separate process for each HTTP
 request.  So an in-memory cache wouldn't be helpful.  It has to be disk-
 based.

​Does not FastCGI do exactly the opposite?​

​RR​


2014-03-12 18:25 GMT+01:00 Richard Hipp d...@sqlite.org:




 On Wed, Mar 12, 2014 at 1:13 PM, Andreas Kupries andre...@activestate.com
  wrote:

 On Wed, Mar 12, 2014 at 6:40 AM, Richard Hipp d...@sqlite.org wrote:

  And if you have alternative suggestions about how to keep a light-weight
  host running smoothly under a massive Fossil request load, please post
  follow-up comments.

 How sensible do you think would it be to have a (limited-size)
 (in-memory|disk) cache to hold the most recently requested tarballs ?
 That way a high-demand tarball, etc. would be computed only once and
 then served statically from the cache.


 It's on my to-do list, actually.  The idea is to have a separate database
 that holds the cache.  And yes it is complementary to the load management
 feature.



 Side note: While the same benefits could be had by putting a regular
 web cache in front of the fossil server, 


 No they can't actually, at least not by any technology I'm aware of.  The
 problem is that these request must be authenticated.  Downloads might be
 only authorized for certain users.  If an authorized user does a download,
 and squid caches it, some other unauthorized user might be able to obtain
 the download from cache.

 Even if downloads are currently authorized for anybody (which is the
 common case, at least on public repos), I don't think you want them being
 cached, since to do so would mean that turning off public downloads would
 be ineffective until the caches all expired.

 I mentioned in-memory and disk ... I can see that a two-level scheme
 here ... A smaller in-memory cache for the really high-demand pieces
 with LRU, and a larger disk cache for the things not so much in-demand
 at the moment, but possibly in the future. The disk cache could
 actually be much larger (disks are large and cheap these days), this
 would help with random access attacks (as they would become
 asymptotically more difficult as the disk cache over time extends its
 net of quickly served assets).


 ​​
 The current Fossil implementation runs a separate process for each HTTP
 request.  So an in-memory cache wouldn't be helpful.  It has to be
 disk-based.

 --
 D. Richard Hipp
 d...@sqlite.org

 ___
 fossil-users mailing list
 fossil-users@lists.fossil-scm.org
 http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil server load control

2014-03-12 Thread Richard Hipp
On Wed, Mar 12, 2014 at 1:26 PM, Stephan Beal sgb...@googlemail.com wrote:


 In my experience, most proxies won't cache for requests which have URL
 parameters. Whether or not that's generally true, i can't say. For static
 content (lots of what fossil serves is static), the URLs can/should be
 written as /path/arg1/arg2, rather than /path?arg1=...arg2=..., to make
 them potentially more cacheable.


With a few carefully chosen exceptions, Fossil always sets Cache-control:
no-cache in the header of its replies, due in large part to those pesky
authentication cookies.

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil server load control

2014-03-12 Thread Stephan Beal
On Wed, Mar 12, 2014 at 6:31 PM, Ramon Ribó ram...@compassis.com wrote:


  
 The current Fossil implementation runs a separate process for each HTTP
  request.  So an in-memory cache wouldn't be helpful.  It has to be disk-
  based.

 Does not FastCGI do exactly the opposite?


FastCGI requires that there be some sort of state object which is can
re-set between calls, and feed that state into each child. Fossil doesn't
have such a state object (it has one, but not one which can simply be
re-set/re-used), so FastCGI can't really do its magic with fossil.
libfossil (currently under construction and moving along nicely) provides
such a construct.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do. -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users