Hi,

Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as part
of the ongoing focus on scalability and performances 
(http://gerrit.ovirt.org/#/c/17694/ and many others),

I took the chance to do a very quick and dirty bench to see how it really cost
to do XML processing in sampling threads (thanks to Nir for the kickstart!), 
and,
in general, how much the XML processing costs.

Please find attached the test script and the example XML
(real one made by VDSM master on my RHEL6.5 box).

On my laptop:

$ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Stepping:              9
CPU MHz:               1359.375
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5786.91
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              4096K
NUMA node0 CPU(s):     0-3

8 GiBs of RAM, running GNOME desktop and the usual development stuff

xmlbench.py linuxvm1.xml MODE 300

MODE is either 'md' (minidom) or 'cet' (cElementTree).
This will run $NUMTHREADS threads fast and loose without synchronization.
We can actually have this behaviour if a customer just mass start VMs.
In general I expect some clustering of the sampling activity, not a nice evenly 
interleaved
time sequence.

CPU measurement: just opened a terminal and run 'htop' on it.
CPU profile: clustered around the sampling interval. Usage negligible most of 
time, peak on sampling as shown below

300 VMs
minidom: ~38% CPU
cElementTree: ~5% CPU

500 VMs
minidom: ~48% CPU
cElementTree: ~6% CPU

1000 VMs
python thread error :)

  File "/usr/lib64/python2.7/threading.py", line 746, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread


I think this is another proof (if we need more of them) that
* we _really need_ to move away from the 1 thread per VM model -> 
http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the discussion!
* we should move to cElementTree anyway in the near future: faster processing, 
scales better, nicer API.
  It is also a pet peeve of mine, I do have some patches floating but we need 
still some preparation work in the virt package.


-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
#!/usr/bin/env python

import sys
import threading
import time
#import lxml.etree
import xml.dom.minidom
import xml.etree.cElementTree
import xml.etree.ElementTree


class Worker(threading.Thread):
    def __init__(self, func, xml, delay, numruns):
        super(Worker, self).__init__()
        self.daemon = True
        self.func = func
        self.xml = xml
        self.delay = delay
        self.numruns = numruns

    def mustgo(self):
        if self.numruns is not None:
            self.numruns -= 1
            if self.numruns <= 0:
                return False
        return True

    def run(self):
        print '%s delay=%i starting!' %(self.name, self.delay)
        while self.mustgo():
            time.sleep(self.delay)
            print '%s go' %(self.name)
            self.func(self.xml)
        print '%s done!' %(self.name)


PARSERS = {
    'md': xml.dom.minidom.parseString,
#    'lx': lxml.etree.fromstring,
    'et': xml.etree.ElementTree.fromstring,
    'cet': xml.etree.cElementTree.fromstring
}


def runner(xml, mode, nthreads, delay, numruns):
    workers = []
    for i in range(nthreads):
        w = Worker(PARSERS[mode], xml, delay, numruns)
        w.start()
        workers.append(w)

    if numruns is None:
        while True:
            time.sleep(1.0)
    else:
        for w in workers:
            w.join()


def _usage():
    print "usage: xmlbench xmlpath mode nthreads [delay [numruns]]"
    print "available modes: %s" % ' '.join(PARSERS.keys())

def _main(args):
    if len(args) < 3:
        _usage()
        sys.exit(1)
    else:
        xmlpath = args[0]
        mode = args[1]
        nthreads = int(args[2])
        delay = int(args[3]) if len(args) > 3 else 15
        numruns = args[4] if len(args) > 4 else None
        if mode not in PARSERS:
            _usage()
            sys.exit(2)
        with open(xmlpath, 'rt') as xml:
            runner(xml.read(), mode, nthreads, delay, numruns)

if __name__ == "__main__":
    _main(sys.argv[1:])

Attachment: linuxvm1.xml
Description: XML document

_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to