Re: [fossil-users] Fossil behind proxy

2010-05-30 Thread Paul Serice
On Sat, 2010-05-29 at 21:59 -0400, Richard Hipp wrote:
 The http://www.sqlite.org/ and http://www.fossil-scm.org/ websites
 are both run off of the same server ... This server takes over a
 quarter million requests per day, 10GB of traffic/day, and it does
 so using less than 3% of of the CPU on a virtual machine that is a
 1/20th slice of a real server. ... How much more efficient does that
 need to be?

Lots ... if it's CGI under Windows.


Thank You,
Paul Serice


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil behind proxy

2010-05-30 Thread Owen Shepherd
On 30 May 2010 02:59, Richard Hipp d...@sqlite.org wrote:

 CGI ... is highly inefficient.  The http://www.sqlite.org/ and
 http://www.fossil-scm.org/ websites are both run off of the same server
 (check the IP addresses on the domains).  The HTTP server there is a simple
 home-brew job implemented as a single file of C code.

 http://www.sqlite.org/docsrc/artifact/d53e8146bf7977

 It is run off of inetd.  For each inbound HTTP request, a new process is
 created which runs the program implemented by the C file shown above.  That
 simple little program either delivers static content or if the file
 requested as the execute permission bit turned on, it runs the file as CGI.
 Very simple.  This server takes over a quarter million requests per day,
 10GB of traffic/day, and it does so using less than 3% of of the CPU on a
 virtual machine that is a 1/20th slice of a real server.

 How much more efficient does that need to be?  Sure, it won't scale up to
 Google or Facebook loads, but it doesn't need to.


Most web applications are written in scripting languages; Fossil is a bit of
an unusual case. In the general case of a scripting language, spawning a new
interpreter for every request is rather disastrous performance wise. It also
brings the issue of rate limiting: You can easily end up with hundreds of
CGI handlers running.


 Why don't I use nginx or apache with SCGI or FastCGI and be even more
 efficient, you ask?  One word:  Simplicity.

 With my setup, there are no servers (other than inetd).  Everything runs
 on-demand.  With no servers running, that means there are no servers to
 crash and require restarting, no servers to configure, no servers to pick up
 performance problems after running a few days due to memory fragmentation or
 resource leaks, and no servers accidentally leaving open TCP ports that can
 be attacked by miscreants.  Oh, and did I mention that my setup runs in a
 chroot jail for additional security.  I'm guessing nginx doesn't do that

 When I design software, I really try hard to make it simple.  Take for
 example, Fossil.  There are (currently) three ways to set up a Fossil
 server.  (1) You can type fossil server REPOSITORY.  (2) You can do a
 simple 1-line edit to your /etc/inetd.conf file.  (3) You can create a
 2-line CGI script and drop it in any cgi-bin.  Each of these techniques can
 (and are) described using 20 or 30 lines of text and one code example.  None
 of them involve editing more than 2 lines in a single file.  Now consider a
 hypothetical SCGI solution.  To get SCGI going, you first have to arrange
 for start the Fossil SCGI server (perhaps with the fossil scgi REPOSITORY
 command) and have it restart automatically when your machine reboots.  You
 have to choose a communications port.  Then you have to edit configuration
 files on your web server to get it to talk to the fossil SCGI server.  So,
 to implement an SCGI solution, you'll need to edit a minimum of two
 configuration files (and probably more if my guess about the complexity of
 nginx is correct).  So the setup for SCGI is at least twice as complex as
 CGI.

 But SCGI will be faster, right?  Well, no.  SCGI will be about the same
 speed, or may just a little slower, because the way the fossil scgi
 command will work (assuming I implement it) will be that the Fossil server
 will accept the incoming SCGI request from the web server.  The fossil
 server will then fork a copy of itself to handle the request, set
 environment variables, then call the existing CGI processing logic to do the
 work.  So SCGI and CGI are going to do the same amount of work and run at
 about the same speed.  The difference is that SCGI will use more resources
 when it is idle (because there is a server hanging around waiting for
 incoming requests, rather than being demain-launched) and SCGI will be at
 least twice as hard to setup and configure.


SCGI should still be slightly more efficient than traditional CGI because it
is exec which tends to be the expensive system call. Of course, we aren't
expecting Fossil to be a heavily accessed server.


 None of the above really solves your problem.  But perhaps it will help you
 to understand why there is not already a fossil scgi command, and why
 statements to the effect that CGI is highly inefficient are not really
 meaningful.

 If I had a web server at hand that would do SCGI, I might consider adding
 the fossil scgi command for you.  But as I don't; I have no way to test
 the fossil scgi command.  But I did outline above (vaguely) the solution
 for you:  Using code very much like the existing HTTP server in fossil,
 implement a command that listens for SCGI requests, then forks a copy of
 itself to handle each request, each request being handled using the existing
 CGI processing logic.  How hard can that be, really?  As an alternative,
 I'll bet you can easily come up with a perl/python/ruby/tcl script that
 implements an SCGI server that execs fossil cgi to handle each 

Re: [fossil-users] Fossil behind proxy

2010-05-29 Thread Owen Shepherd
On 30 May 2010 00:53, Michael McDaniel fos...@autosys.us wrote:

  I wound up running lighttpd for the sole purpose of serving fossil
  via cgi scripts.  lighttpd is pretty lightweight on resources.

 ~Michael


The idea has crossed my mind, but the idea of having to maintain another set
of configuration files frankly horrifies me ;-)

I've been snooping at the Fossil source, and it looks like setting things up
to support SCGI shouldn't be *too* hard, but I'm still not really sure where
to begin.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil behind proxy

2010-05-29 Thread Richard Hipp
On Sat, May 29, 2010 at 7:21 PM, Owen Shepherd owen.sheph...@e43.eu wrote:

 We are currently experimenting with setting up a Fossil server, but have
 encountered a bit of an issue: Fossil doesn't seem to support being operated
 behind a proxy. As we wish to run Fossil on port 80, and to do so it must
 sit behind our primary web server, this is a bit of an issue.

 The ideal solution for us would be to run Fossil as an SCGI or FastCGI
 service (I would lean towards SCGI as it is a much simpler protocol) and
 have our web server dispatch requests to that, but this is at present not
 possible. We cannot run Fossil as a CGI because we use Nginx, which does not
 support it (With the valid reason that very little uses CGI these days and
 that it is highly inefficient)


CGI ... is highly inefficient.  The http://www.sqlite.org/ and
http://www.fossil-scm.org/ websites are both run off of the same server
(check the IP addresses on the domains).  The HTTP server there is a simple
home-brew job implemented as a single file of C code.

http://www.sqlite.org/docsrc/artifact/d53e8146bf7977

It is run off of inetd.  For each inbound HTTP request, a new process is
created which runs the program implemented by the C file shown above.  That
simple little program either delivers static content or if the file
requested as the execute permission bit turned on, it runs the file as CGI.
Very simple.  This server takes over a quarter million requests per day,
10GB of traffic/day, and it does so using less than 3% of of the CPU on a
virtual machine that is a 1/20th slice of a real server.

How much more efficient does that need to be?  Sure, it won't scale up to
Google or Facebook loads, but it doesn't need to.

Why don't I use nginx or apache with SCGI or FastCGI and be even more
efficient, you ask?  One word:  Simplicity.

With my setup, there are no servers (other than inetd).  Everything runs
on-demand.  With no servers running, that means there are no servers to
crash and require restarting, no servers to configure, no servers to pick up
performance problems after running a few days due to memory fragmentation or
resource leaks, and no servers accidentally leaving open TCP ports that can
be attacked by miscreants.  Oh, and did I mention that my setup runs in a
chroot jail for additional security.  I'm guessing nginx doesn't do that

When I design software, I really try hard to make it simple.  Take for
example, Fossil.  There are (currently) three ways to set up a Fossil
server.  (1) You can type fossil server REPOSITORY.  (2) You can do a
simple 1-line edit to your /etc/inetd.conf file.  (3) You can create a
2-line CGI script and drop it in any cgi-bin.  Each of these techniques can
(and are) described using 20 or 30 lines of text and one code example.  None
of them involve editing more than 2 lines in a single file.  Now consider a
hypothetical SCGI solution.  To get SCGI going, you first have to arrange
for start the Fossil SCGI server (perhaps with the fossil scgi REPOSITORY
command) and have it restart automatically when your machine reboots.  You
have to choose a communications port.  Then you have to edit configuration
files on your web server to get it to talk to the fossil SCGI server.  So,
to implement an SCGI solution, you'll need to edit a minimum of two
configuration files (and probably more if my guess about the complexity of
nginx is correct).  So the setup for SCGI is at least twice as complex as
CGI.

But SCGI will be faster, right?  Well, no.  SCGI will be about the same
speed, or may just a little slower, because the way the fossil scgi
command will work (assuming I implement it) will be that the Fossil server
will accept the incoming SCGI request from the web server.  The fossil
server will then fork a copy of itself to handle the request, set
environment variables, then call the existing CGI processing logic to do the
work.  So SCGI and CGI are going to do the same amount of work and run at
about the same speed.  The difference is that SCGI will use more resources
when it is idle (because there is a server hanging around waiting for
incoming requests, rather than being demain-launched) and SCGI will be at
least twice as hard to setup and configure.

None of the above really solves your problem.  But perhaps it will help you
to understand why there is not already a fossil scgi command, and why
statements to the effect that CGI is highly inefficient are not really
meaningful.

If I had a web server at hand that would do SCGI, I might consider adding
the fossil scgi command for you.  But as I don't; I have no way to test
the fossil scgi command.  But I did outline above (vaguely) the solution
for you:  Using code very much like the existing HTTP server in fossil,
implement a command that listens for SCGI requests, then forks a copy of
itself to handle each request, each request being handled using the existing
CGI processing logic.  How hard can that be, really?  As an alternative,
I'll bet you