Re: [Libvir] "virsh list" command of libvirt consumes a lot of CPU in the domain-0

2008-03-12 Thread Daniel Veillard
On Tue, Mar 11, 2008 at 04:48:57PM +0100, [EMAIL PROTECTED] wrote:
> 
>Hi all,
>I know that this is not a libvirt issue but this badly impacts libvirt
>usage.
>Is anyone aware of any status on this issue ? Daniel ?
>Here is some history I could get from the libvirt mailing list :
>* October 12, 2006 (Daniel Berrange).
>I've  been  trying to track down just why talking to XenD is resulting
>in so much CPU time being
>comsumed  by both xend & xenstored. As a test case, I'm running 'virsh
>dominfo demo' which results in
>a  single  HTTP  request  to  Xend  to  fetch  domain  info,  eg  'GET
>/xend/domains/demo'
>Run  this  in  a tight loop & I'll see xenstored taking > 50% CPU, and
>XenD taking another 11%

  yes this is a serious performance issue in xend.

[...]
>single read in XenD. Now if I
>monitor the status of 20 domains, once per second that's causing 40 MB
>of writes & 40 MB of reads
>every second which is utterly ridiculous & completely non scalable for
>enterprise deployment :-(

  agreed

[...]
>> Xen 3.0.3 has a serious performance bug
>> (see
>http://lists.xensource.com/archives/html/xen-devel/2006-10/msg00487.ht
>ml)
>> This bug is fixed in Xen 3.0.4
>No  it isn't. The performance bug is actually at least x2 worse in Xen
>3.0.4

I was told that xenstored had been rewritten for the Xen Enterprise version.
There is little I can do at that level honnestly, in libvirt we need the
xend access only to make the binding between the domain ID and the domain
name. All update informations are provided by the hypervisor, including
the ID list, something we could do is to try to cache the id <-> name 
domain association, this is a bit risky since there is no way libvirt can
learn that a binding has changed (either because of an xm rename command
or the domain was destroyed and a new domain created with same id).
If we keep the association in the daemon and use a timeout flush maybe
this could be worked out, but it's really a bad workaround, and the
proper thing would be to fix this long standing bug in xenstored, it's a bit
depressing that no progress had been made on the open source version.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/

--
Libvir-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/libvir-list


[Libvir] "virsh list" command of libvirt consumes a lot of CPU in the domain-0

2008-03-11 Thread jean-paul . pigache
Hi all,
I know that this is not a libvirt issue but this badly impacts libvirt 
usage.
Is anyone aware of any status on this issue ? Daniel ?

Here is some history I could get from the libvirt mailing list :

* October 12, 2006 (Daniel Berrange).
I've been trying to track down just why talking to XenD is resulting in so 
much CPU time being
comsumed by both xend & xenstored. As a test case, I'm running 'virsh 
dominfo demo' which results in
a single HTTP request to Xend to fetch domain info, eg 'GET 
/xend/domains/demo'
Run this in a tight loop & I'll see xenstored taking > 50% CPU, and XenD 
taking another 11%
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2647 root 16 0 6188 840 464 R 52 0.0 0:55.04 xenstored
11600 root 18 0 259m 7568 1516 S 11 0.2 0:04.53 python
Its not surprising that xend is consuming time since we are making many 
requests per second, but for
an operation which is only doing reads it having so much time attributed 
to xenstored seems very
excessive. So I ran oprofile & collected some data about xenstored:
CPU: AMD64 processors, speed 2211.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
mask of 0x00 (No
unit mask) count 10
samples % image name symbol name
347226 45.9445 ext3 (no symbols)
264664 35.0200 jbd (no symbols)
31778 4.2048 libc-2.5.so memset
10763 1.4241 xenstored main
8884 1.1755 libc-2.5.so _int_malloc
7053 0.9332 libc-2.5.so vfprintf
4264 0.5642 xenstored initialize_set
So almost 80% of xenstored's CPU time is attributed to ext3 & journalling 
modules, suggesting
xenstored is doing alot of disk I/O. strace()'ing the xenstored process 
shows the only file it is opening
is:
# strace -p 2647 -e trace=open,rename,unlink
Process 2647 attached - interrupt to quit
open("/var/lib/xenstored/tdb.0x62aa80", O_WRONLY|O_CREAT|O_TRUNC, 0640) = 
13
open("/var/lib/xenstored/tdb.0x62aa80", O_RDWR) = 15
rename("/var/lib/xenstored/tdb.0x62aa80", "/var/lib/xenstored/tdb") = 0
unlink("/var/lib/xenstored/tdb.0x62aa80") = -1 ENOENT (No such file or 
directory)
open("/var/lib/xenstored/tdb.0x62b2b0", O_WRONLY|O_CREAT|O_TRUNC, 0640) = 
13
open("/var/lib/xenstored/tdb.0x62b2b0", O_RDWR) = 14
rename("/var/lib/xenstored/tdb.0x62b2b0", "/var/lib/xenstored/tdb") = 0
unlink("/var/lib/xenstored/tdb.0x62b2b0") = -1 ENOENT (No such file or 
directory)
...
So basically it is repeatedly copying its persistent TBD database over and 
over again. The TDB on this
system is 128 KB in size and each individual HTTP GET on /xend/domain/demo 
is resulting in 16
copies being made.
Do the maths - 128 * 16 == 2 MB of reads, and 2 MB of writes - for a 
single read in XenD. Now if I
monitor the status of 20 domains, once per second that's causing 40 MB of 
writes & 40 MB of reads
every second which is utterly ridiculous & completely non scalable for 
enterprise deployment :-(
There's two problems I see here:
1. Why the need for xenstored to be doing any of this I/O in the first 
place?
If the DB needs to be kept on disk at all, it really needs to have a much 
saner update/transactional
model to only update bits which actually change, rather than re-creating 
the entire DB on every
transaction. But it strikes me that the DB could potentially be kept 
entirely in memory removing the
disk I/O completely. Sure yyou wouldn't be able to restart the daemon 
then, but even today you can't
restart xenstored & expect things to still be working.
2. Why does XenD create sooo many transactions in XenStored for a read op 
?
Having instrumented Xend it sems that the root cause of the problem is the
xen.xend.xenstore.xstransact class. This alllows one to start a 
transaction, do a bunch of
reads/writes & then commit the transaction. At the same time though it has 
a bunch of static
'convenience' methods for read & write which will implicitly start & 
commit a transaction. Well
90% of the code in XenD seems to be using these 'convenience' methods 
instead of explicitly
starting a transaction to cover a piece of work - the result is a simple 
GET causes 16 transactions
and an 'xm create' results in 80 transactions. These convenience 
methods are utterly destroying
performance.
Clearly we can't address these for 3.0.3, but I think both of these areas 
need serious work in 3.0.4 if we
want a scalable control plane in Dom0. Fixing the XenD bit looks 
particularly hard because any single
method using the convenience xenstored read functions can be called under 
many different contexts, so
of which needs transactions, others which don't. It ought to be possible 
to trace back all the calls &
make it possible to pass explicit xstransct objects into all calls & then 
kill off the convenience methods.

* Answer, same day (October 12, 2006)

Yes, xenstored is very simple minded in many respects. We will certainly 
be improving this during
3.0.4 development -- I think we can get the costs down very significantly 
for commonplace operations
without enormous effort.

* Avril 25, 2007 (Danie