Hi Kirill, Kirill Korotaev wrote:
2. CPU scheduler - Do you have any benchmarks of CKRM CPU scheduler? For example, Java Volano benchmark? What is the overhead of CKRM CPU scheduler on SMP system compared to native Linux CPU scheduler?
I tried lat_ctx in lmbench and Volano benchmark, but I cannot measure any performance regression. At least, the overhead is less than the measuring error limit. In addition, it is not yet another bland new CPU scheduler, but an extension for the O(1) scheduler. Basic strategy is that counting load average for each class and if the controller detects load imbalance in terms of the guarantee(weight) of class, it shortens process time slice to balance the load. The heart of the CPU controller is about 300 lines with 50 lines patch against O(1) scheduler. = Volano benchmark = - vanilla 2.4.16 java.vendor = Sun Microsystems Inc. java.vendor.url = http://java.sun.com/ java.version = 1.4.2_10 java.class.version = 48.0 java.compiler = null os.name = Linux os.version = 2.6.14-default os.arch = i386 VolanoMark version = 2.5.0.9 Messages sent = 20000 Messages received = 380000 Total messages = 400000 Elapsed time = 19.328 seconds Average throughput = 20695 messages per second - 2.4.16 + ckrm-f0.4 + cpurc-v0.2 java.vendor = Sun Microsystems Inc. java.vendor.url = http://java.sun.com/ java.version = 1.4.2_10 java.class.version = 48.0 java.compiler = null os.name = Linux os.version = 2.6.14-f0.4-v0.2-default os.arch = i386 VolanoMark version = 2.5.0.9 Messages sent = 20000 Messages received = 380000 Total messages = 400000 Elapsed time = 19.249 seconds Average throughput = 20780 messages per second = lat_ctx -s 0 -N 3000 2 4 8 16 32 64 128 256 = - vanilla 2.4.16 "size=0k ovr=1.25 2 0.60 4 0.63 8 0.79 16 0.91 32 0.97 64 0.99 128 1.54 256 2.25 - 2.4.16 + ckrm-f0.4 + cpurc-v0.2 "size=0k ovr=0.88 2 0.68 4 0.68 8 0.80 16 0.84 32 0.82 64 0.80 128 1.08 256 1.99
- Is CKRM CPU scheduler SMP-scalable? Do you have any benchmark results available?
Due to hardware availability, I've only tested it on a 2CPU machine. Only place liner algorithm is used is to add up class load average for each cpu. It might be a cache unfriendly loop, but it just happen once in a second for each task on refilling a timeslice. It is OK for small number of cpus, but it would be a bottleneck on large number of cpus. To avoid this problem, I plan to subdivide cpus in group like CPUSETS, and then CPU controller independently works among the group. The merit is that the CPU controller doesn't need to add up the class load for each cpu, but only for cpus in the group. Thanks, MAEDA Naoaki ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ ckrm-tech mailing list https://lists.sourceforge.net/lists/listinfo/ckrm-tech