I see really slow vmalloc performance on 2.6.35-rc3:

# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 3)   3.581 us    |  vfree();
 3)               |  msr_io() {
 3) ! 523.880 us  |    vmalloc();
 3)   1.702 us    |    vfree();
 3) ! 529.960 us  |  }
 3)               |  msr_io() {
 3) ! 564.200 us  |    vmalloc();
 3)   1.429 us    |    vfree();
 3) ! 568.080 us  |  }
 3)               |  msr_io() {
 3) ! 578.560 us  |    vmalloc();
 3)   1.697 us    |    vfree();
 3) ! 584.791 us  |  }
 3)               |  msr_io() {
 3) ! 559.657 us  |    vmalloc();
 3)   1.566 us    |    vfree();
 3) ! 575.948 us  |  }
 3)               |  msr_io() {
 3) ! 536.558 us  |    vmalloc();
 3)   1.553 us    |    vfree();
 3) ! 542.243 us  |  }
 3)               |  msr_io() {
 3) ! 560.086 us  |    vmalloc();
 3)   1.448 us    |    vfree();
 3) ! 569.387 us  |  }

msr_io() is from arch/x86/kvm/x86.c, allocating at most 4K (yes it should use kmalloc()). The memory is immediately vfree()ed. There are 96 entries in /proc/vmallocinfo, and the whole thing is single threaded so there should be no contention.

Here's the perf report:

63.97% qemu [kernel] [k] rb_next
                       |
                       --- rb_next
                          |
                          |--70.75%-- alloc_vmap_area
                          |          __get_vm_area_node
                          |          __vmalloc_node
                          |          vmalloc
                          |          |
                          |          |--99.15%-- msr_io
                          |          |          kvm_arch_vcpu_ioctl
                          |          |          kvm_vcpu_ioctl
                          |          |          vfs_ioctl
                          |          |          do_vfs_ioctl
                          |          |          sys_ioctl
                          |          |          system_call
                          |          |          __GI_ioctl
                          |          |          |
| | --100.00%-- 0x1dfc4a8878e71362
                          |          |
                          |           --0.85%-- __kvm_set_memory_region
                          |                     kvm_set_memory_region
| kvm_vm_ioctl_set_memory_region
                          |                     kvm_vm_ioctl
                          |                     vfs_ioctl
                          |                     do_vfs_ioctl
                          |                     sys_ioctl
                          |                     system_call
                          |                     __GI_ioctl
                          |
                           --29.25%-- __get_vm_area_node
                                     __vmalloc_node
                                     vmalloc
                                     |
                                     |--98.89%-- msr_io
                                     |          kvm_arch_vcpu_ioctl
                                     |          kvm_vcpu_ioctl
                                     |          vfs_ioctl
                                     |          do_vfs_ioctl
                                     |          sys_ioctl
                                     |          system_call
                                     |          __GI_ioctl
                                     |          |
| --100.00%-- 0x1dfc4a8878e71362


It seems completely wrong - iterating 8 levels of a binary tree shouldn't take half a millisecond.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to