On Aug 26, 6:55 pm, Alvaro Lopez Ortega <[EMAIL PROTECTED]> wrote:
> Graham Dumpleton wrote:
> > FWIW, if would do much better to Cherokee's credibility, presuming you
> > speak for its development team, if you present proper informed
> > analysis rather than conjecture and FUD.
>
> Graham,
>
> First, I am glad you are interested on Cherokee and the decisions we are
> making. In fact, I could not think of anybody else who could comment on
> this better than you (as mod_wsgi author).
>
> So, first of all, thanks for the long Apache internals tutorial. It has
> been enlightening (seriously, even if it is a program I do not fancy
> very much, there were a couple of interesting points I did not know).
>
> However, there are a number of things you said I can not agree with.
> The first and most important one, is the design flaw. Let's try to
> forget about Apache for a second. Let's forget about its performance
> issues (a few of them might be Layer-8 issues as you pointed) and its
> ancient NCSA inheritance. Let's focus for a second on information
> systems design from a high level point of view. From that perspective,
> keeping the application logic, data and representation independent from
> each other (MVC-like) is a good idea; I do not think there is much to
> discuss about it. The very same principle applies to this case, keeping
> the transport layer (the web server) and the application logic
> independent IS a good idea as well, regardless of any concrete
> implementation.
>
> There is a bunch of examples and similes that support this affirmation.
> Well known and accepted programming paradigms, the Unix way of working:
> applications with a single purpose communicating with each other (never
> embedding), or even the current web technology where people keep their
> data, formating (css) and application logic independent for the shake of
> maintainability.
All well and good, but in practice the majority of people will not
care about that one bit. People just want something that works and
even if it could be argued that the way something is internally
implemented may not be pure in one sense or another, that will nearly
never come into it. Give them something that, at the level they want
to work, is easy to use and has the features they need, they will be
more than happy.
> Besides that, there was another thing you wrote that caught my
> attention. As we all know, size matters (in this world, big is bad
> though). You said that the Python interpreter is not that big, and you
> were partially right, although you missed the Principle of relativity.
>
> Putting it in context will change your perception:
>
> --------------
> $ ps -eo rss,comm | grep python
> 3620 python
Be very careful here about this figure. Even when Python is installed
so as to provide a shared library, the 'python' executable may not use
it. I have seen this with various platforms and distributions. End
result is that 'python' is statically linked to the Python library and
so it shows as private memory to the process. When 'python' executable
is properly linked to shared Python library, the library is shared and
doesn't show as private memory to the process.
One can tell if 'python' executable is using shared library or not on
most UNIX platforms by running:
ldd /usr/bin/python
If you don't see a reference to a libpythonX.Y.so it isn't using the
shared library.
Result is that processes created by running 'python' executable can
actually show more memory usage than when the Python interpreter is
used embedded with Apache.
A long time ago I started a blog post where I was going to go into a
fair bit of detail about this, but never finished it. I quote though
some of that blog post.
"""
As a starting point, when I startup Apache 2.2 on my Mac OS X (PPC)
system with just the standard Apache modules the amount of memory an
Apache child process uses is:
RPRVT RSHRD RSIZE VSIZE
336K 2.14M 764K 43.1M
The important figure here is RPRVT which means the resident private
memory size, or the amount of memory which is unique to the process.
This is distinct from RSHRD which is the amount of memory which is
actually shared with other processes. That RSHRD may be a lot is not
as big a deal as you only count it once for all running processes
whereas the value of RPRVT for each Apache child process is added
together to determine overall how much memory is being used by Apache.
When mod_wsgi is compiled the Apache module which is produced is less
than 100 KBytes in size. When actually loaded into Apache, the amount
of memory used by an Apache child processes increases to:
RPRVT RSHRD RSIZE VSIZE
520K 3.55M 1.08M 43.8M
What we see is a small increase in the amount of private memory in use
and a more substantial amount of shared memory. The reason for the
increase in the amount of shared memory in use is that the mod_wsgi
module itself will be loaded as a shared object. This only accounts
for a small amount though, the bulk of the increase actually derives
from the Python library which the mod_wsgi module is dependent on and
which also is being loaded as a shared object. As to the increase in
private memory use, this comes from the initialisation of mod_wsgi,
Python and the creation of the initial Python interpreter.
"""
So for MacOS X (PPC) and with that specific version of Python at
least, the results are quite different.
Even on my newer Mac OS X (Intel) box which uses a 64 bit Apache and
Python (2.5), the results aren't too drastically different for case
where mod_wsgi has been loaded.
PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
VSIZE
94219 httpd 0.0% 0:00.00 1 10 236 472K 6776K
1676K 27M
94218 httpd 0.0% 0:00.01 18 44 270 720K 6664K
1708K 37M
94217 httpd 0.0% 0:00.30 1 15 236 84K 6776K
6756K 27M
These results aren't directly comparable as I don't recollect how many
threads were created in example for the original blog. All the same,
the three processes shown in this 'top' output are interesting in
themselves.
The process with ID 94217 is actually the Apache parent process. Being
the Apache parent process only very minimal Apache setup has been done
in respect of Apache modules which are loaded as bulk of setup is done
in child processes after they fork. With how mod_wsgi currently works
however, one thing that has been done in that Apache parent process is
the initialisation of the Python interpreter. Even though this has
been done, private memory used is only 84K. It is less than blog entry
as the blog entry wasn't looking at parent process but those forked
from it.
Process with ID 94219 is an Apache child process. It has one thread as
Apache is running with prefork MPM in this instance. Private memory
usage has increased with it being a mix from initialisation of other
Apache modules in child process as well as some of what mod_wsgi is
doing which is resulting in importing of some additional Python
modules.
Process with ID 94218 is a mod_wsgi daemon mode process. Its memory
usage is much more because of the additional threads it has created to
handle requests and the subsequent per thread overhead in thread
libraries but also in Apache/mod_wsgi data structures to manage them.
As you can see, for where shared Python library is definitely used, a
lot less memory is being used than what may be suggested should be the
case if running command line 'python' executable. For the same system
as later output, running 'python' on command line actually yields:
PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD
RSIZE VSIZE
5314 Python 0.0% 0:00.01 1 13 37 1152K 1136K
2272K 19M
For MacOS X it can get a bit confusing at this point. If one looks at
mod_wsgi.so one sees:
$ ls -las mod_wsgi.so
1264 -rwxr-xr-x 1 grahamd staff 646360 26 Aug 20:55 mod_wsgi.so
This is rather large because it happens to have object code in their
for four different architectures.
$ file mod_wsgi.so
mod_wsgi.so: Mach-O universal binary with 4 architectures
mod_wsgi.so (for architecture ppc7400): Mach-O bundle ppc
mod_wsgi.so (for architecture ppc64): Mach-O 64-bit bundle ppc64
mod_wsgi.so (for architecture i386): Mach-O bundle i386
mod_wsgi.so (for architecture x86_64): Mach-O 64-bit bundle x86_64
Only the code for the specific architecture actually gets loaded.
Looking at library dependencies we have:
$ otool -L mod_wsgi.so
mod_wsgi.so:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 111.1.1)
/System/Library/Frameworks/Python.framework/Versions/2.5/Python
(compatibility version 2.5.0, current version 2.5.1)
/usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current
version 1.0.0)
Thus can clearly see it has dependency on Python shared library
(framework) and that framework isn't small, although that framework is
again for multiple architectures.
$ ls -las /System/Library/Frameworks/Python.framework/Versions/2.5/
Python
9976 -rwxr-xr-x 1 root wheel 5106240 16 Apr 16:00 /System/Library/
Frameworks/Python.framework/Versions/2.5/Python
$ file /System/Library/Frameworks/Python.framework/Versions/2.5/Python
/System/Library/Frameworks/Python.framework/Versions/2.5/Python: Mach-
O universal binary with 4 architectures
/System/Library/Frameworks/Python.framework/Versions/2.5/Python (for
architecture ppc7400): Mach-O dynamically linked shared library ppc
/System/Library/Frameworks/Python.framework/Versions/2.5/Python (for
architecture ppc64): Mach-O 64-bit dynamically linked shared library
ppc64
/System/Library/Frameworks/Python.framework/Versions/2.5/Python (for
architecture i386): Mach-O dynamically linked shared library i386
/System/Library/Frameworks/Python.framework/Versions/2.5/Python (for
architecture x86_64): Mach-O 64-bit dynamically linked shared library
x86_64
Looking at 'python' command line is odd though.
$ ls -las python
8 lrwxr-xr-x 1 root wheel 72 16 Nov 2007 python -> ../../System/
Library/Frameworks/Python.framework/Versions/2.5/bin/python
$ ls -lasL python
80 -rwxr-xr-x 1 root wheel 38112 10 Oct 2007 python
It is actually really small. As much as I can work out it is actually
just a launcher program, but not entirely sure what it is launching.
If one looks at open files it uses for running instance one gets:
$ lsof -p 95719
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
Python 95719 grahamd txt REG 14,2 38144 99686 /System/
Library/Frameworks/Python.framework/Versions/2.5/Resources/Python.app/
Contents/MacOS/Python
Python 95719 grahamd txt REG 14,2 90268 98161 /System/
Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/lib-
dynload/readline.so
Python 95719 grahamd txt REG 14,2 5106240 5659691 /System/
Library/Frameworks/Python.framework/Versions/2.5/Python
So, it has got the framework open and what looks like is happening is
that it is almost launching the framework like an application itself.
The reason is just doesn't show as same size as framework is because
it only loads code for architecture it needs. End result though is
much like if library was linked statically and so shows much more
memory use than if Python embedded in Apache.
Now, it may be the case here that how MacOS X manages memory may be
somewhat different to Linux and it therefore does a better job of it.
One would need to sit down and do a proper analysis of what is
happening on Linux to know.
> $ ps -eo rss,comm | grep cherokee
> 1856 cherokee
>
> $ ps -eo rss,comm | grep apache2
> 2068 apache2
> 2216 apache2
> 2220 apache2
> --------------
>
> 0 1Mb 2Mb 3Mb 4Mb 5Mb 6Mb 7Mb
> | | | | | | | |
> Che |=============> |
> Py |===========================> |
> Apa |===================================================> |
>
> For the record: Python was a completely empty interpreter with no
> imported modules at all. Cherokee was using the default set up for
> serving static content, and Apache was configured with a minimum number
> of modules (even logging was commented out).
>
> So, despite what you suggest, if a bare minimum interpreter is twice as
> big as the web server, I wouldn't personally call it "small". IMO Python
> rocks anyway, but calling it small may be too much.
But to put that if further context, it is not uncommon for a process
holding a Python web framework instance to be anywhere between 40 and
100MB for a typical site. When one looks at that, both the Python
interpreter and web server overhead is a relatively small component.
This is why for large Python web applications the comparisons you are
drawing really don't matter. If you are talking about small Python web
applications, it also doesn't really matter as at that scale it is
small enough that overall memory usage isn't an issue anyway, so
whether you have blown out your web server process size wouldn't
matter to most people as long as it gets the job done.
> Anyway! Let's put the swords down.
Oh, but technical discussions like this are always fun. It wasn't
really meant as an attack in the first place, but more of a sharing of
information. :-)
Graham
_______________________________________________
Cherokee mailing list
[email protected]
http://lists.octality.com/listinfo/cherokee