RE: [PATCH] apic: test tsc deadline timer

2011-10-13 Thread Liu, Jinsong
Avi Kivity wrote:
 On 10/09/2011 05:32 PM, Liu, Jinsong wrote:
 Updated test case for kvm tsc deadline timer
 https://github.com/avikivity/kvm-unit-tests, as attached. 
 
 
 Applied, thanks.

Which tree? I didn't find it at git://github.com/avikivity/kvm-unit-tests.git

Thanks,
Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread benoit ROUSSELLE
Hello,

Can someone running with virtio tell me if this is normal to have such
a difference with dd between host and guest machine ?
host dd is 500MB/s
guest dd is 150MB/s

All other questions are optionnal for now :D

Thanks in advance,
Benoit.

PS: does replacing virtio with vhost and a more recent kvm would help ?

On Wed, Oct 12, 2011 at 2:21 PM, benoit ROUSSELLE brousse...@gmail.com wrote:
 Dear All,

 I am doing some disk io performance testing on the following environment:
 Dell AMD R515 with:
 - debian6 (2.6.32) on host and guests.
 - raid1 mirror on a perc h700
 - lvm used to create virtual disks volumes
 - virtio enabled per default on 2.6.32 and used for nic and disk drivers

 For my tests i use the following command:
 dd bs=6M count=500 if=/dev/zero of=toto.txt

 On local host i get 500MB/s
 On 1 guest 150MB/s
 On 2 guests in parallel 70MB/s

 Here are my questions:
 1) Is this the expected behavior ?
 2) What can i do next to improve local disk performance ?
 3) If i want to do shared storage with best disk io performance, what
 kind of storage should i use ?

 Thanks in advance :D
 Benoit.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread Michael Tokarev
On 13.10.2011 13:05, benoit ROUSSELLE wrote:
 Hello,
 
 Can someone running with virtio tell me if this is normal to have such
 a difference with dd between host and guest machine ?
 host dd is 500MB/s
 guest dd is 150MB/s
 
 All other questions are optionnal for now :D

Well, one question is mandatory: what is your kvm command line?
Without it, there's no way to answer any other questions.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] kvm tools: Fix spelling mistake

2011-10-13 Thread Sasha Levin
Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/guest_compat.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/guest_compat.c b/tools/kvm/guest_compat.c
index c5bacb8..fac53f5 100644
--- a/tools/kvm/guest_compat.c
+++ b/tools/kvm/guest_compat.c
@@ -91,7 +91,7 @@ int compat__print_all_messages(void)
 
msg = list_first_entry(messages, struct compat_message, list);
 
-   printf(\n\n*** Compatability Warning ***\n\n\t%s\n\n%s\n,
+   printf(\n\n*** Compatibility Warning ***\n\n\t%s\n\n%s\n,
msg-title, msg-desc);
 
list_del(msg-list);
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] kvm tools: Use compat message per device instead of per instance

2011-10-13 Thread Sasha Levin
This prevents multiple messages for the same type of device.

Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/virtio-9p.h |1 -
 tools/kvm/virtio/9p.c |6 --
 tools/kvm/virtio/balloon.c|7 ---
 tools/kvm/virtio/blk.c|7 ---
 tools/kvm/virtio/console.c|8 +---
 tools/kvm/virtio/net.c|7 ---
 tools/kvm/virtio/rng.c|7 ---
 7 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio-9p.h 
b/tools/kvm/include/kvm/virtio-9p.h
index 07084c3..3096df2 100644
--- a/tools/kvm/include/kvm/virtio-9p.h
+++ b/tools/kvm/include/kvm/virtio-9p.h
@@ -45,7 +45,6 @@ struct p9_dev {
struct virtio_pci   vpci;
 
struct virtio_9p_config *config;
-   int compat_id;
u32 features;
 
/* virtio queue */
diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index 22a2732..b9b92b8 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -20,6 +20,7 @@
 #include net/9p/9p.h
 
 static LIST_HEAD(devs);
+static int compat_id = -1;
 
 /* Warning: Immediately use value returned from this function */
 static const char *rel_to_abs(struct p9_dev *p9dev,
@@ -1148,7 +1149,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 pfn)
struct virt_queue *queue;
void *p;
 
-   compat__remove_message(p9dev-compat_id);
+   compat__remove_message(compat_id);
 
queue   = p9dev-vqs[vq];
queue-pfn  = pfn;
@@ -1247,7 +1248,8 @@ int virtio_9p__register(struct kvm *kvm, const char 
*root, const char *tag_name)
 
list_add(p9dev-list, devs);
 
-   p9dev-compat_id = compat__add_message(virtio-9p device was not 
detected,
+   if (compat_id != -1)
+   compat_id = compat__add_message(virtio-9p device was not 
detected,
While you have requested a 
virtio-9p device, 
the guest kernel didn't seem 
to detect it.\n
Please make sure that the 
kernel was compiled 
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index 0b79132..643392b 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -44,12 +44,12 @@ struct bln_dev {
u16 stat_count;
int stat_waitfd;
 
-   int compat_id;
struct virtio_balloon_config config;
 };
 
 static struct bln_dev bdev;
 extern struct kvm *kvm;
+static int compat_id = -1;
 
 static bool virtio_bln_do_io_request(struct kvm *kvm, struct bln_dev *bdev, 
struct virt_queue *queue)
 {
@@ -226,7 +226,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
pfn)
struct virt_queue *queue;
void *p;
 
-   compat__remove_message(bdev-compat_id);
+   compat__remove_message(compat_id);
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
@@ -280,7 +280,8 @@ void virtio_bln__init(struct kvm *kvm)
.get_size_vq= get_size_vq,
};
 
-   bdev.compat_id = compat__add_message(virtio-balloon device was not 
detected,
+   if (compat_id != -1)
+   compat_id = compat__add_message(virtio-balloon device was not 
detected,
While you have requested a 
virtio-balloon device, 
the guest kernel didn't seem 
to detect it.\n
Please make sure that the 
kernel was compiled 
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index c508123..272613b 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -44,7 +44,6 @@ struct blk_dev {
struct virtio_pci   vpci;
struct virtio_blk_configblk_config;
struct disk_image   *disk;
-   int compat_id;
u32 features;
 
struct virt_queue   vqs[NUM_VIRT_QUEUES];
@@ -53,6 +52,7 @@ struct blk_dev {
 };
 
 static LIST_HEAD(bdevs);
+static int compat_id;
 
 static void virtio_blk_do_io_request(struct kvm *kvm, void *param)
 {
@@ -154,7 +154,7 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 
pfn)
struct virt_queue *queue;
void *p;
 
-   compat__remove_message(bdev-compat_id);
+   compat__remove_message(compat_id);
 
queue   = bdev-vqs[vq];
queue-pfn  = pfn;
@@ -220,7 +220,8 @@ void virtio_blk__init(struct kvm *kvm, struct disk_image 
*disk)
 
list_add_tail(bdev-list, bdevs);
 
-   bdev-compat_id = compat__add_message(virtio-blk device was not 
detected,
+   if 

[PATCH 3/7] kvm tools: Improve compat message

2011-10-13 Thread Sasha Levin
Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/virtio/9p.c  |7 ---
 tools/kvm/virtio/balloon.c |7 ---
 tools/kvm/virtio/blk.c |7 ---
 tools/kvm/virtio/console.c |7 ---
 tools/kvm/virtio/net.c |7 ---
 tools/kvm/virtio/rng.c |7 ---
 6 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c
index b9b92b8..8dbd016 100644
--- a/tools/kvm/virtio/9p.c
+++ b/tools/kvm/virtio/9p.c
@@ -1251,9 +1251,10 @@ int virtio_9p__register(struct kvm *kvm, const char 
*root, const char *tag_name)
if (compat_id != -1)
compat_id = compat__add_message(virtio-9p device was not 
detected,
While you have requested a 
virtio-9p device, 
-   the guest kernel didn't seem 
to detect it.\n
-   Please make sure that the 
kernel was compiled 
-   with CONFIG_NET_9P_VIRTIO.);
+   the guest kernel did not 
initialize it.\n
+   Please make sure that the 
guest kernel was 
+   compiled with 
CONFIG_NET_9P_VIRTIO=y enabled 
+   in its .config);
 
return err;
 
diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c
index 643392b..1691b79 100644
--- a/tools/kvm/virtio/balloon.c
+++ b/tools/kvm/virtio/balloon.c
@@ -283,7 +283,8 @@ void virtio_bln__init(struct kvm *kvm)
if (compat_id != -1)
compat_id = compat__add_message(virtio-balloon device was not 
detected,
While you have requested a 
virtio-balloon device, 
-   the guest kernel didn't seem 
to detect it.\n
-   Please make sure that the 
kernel was compiled 
-   with CONFIG_VIRTIO_BALLOON.);
+   the guest kernel did not 
initialize it.\n
+   Please make sure that the 
guest kernel was 
+   compiled with 
CONFIG_VIRTIO_BALLOON=y enabled 
+   in its .config);
 }
diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index 272613b..8605951 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -223,9 +223,10 @@ void virtio_blk__init(struct kvm *kvm, struct disk_image 
*disk)
if (compat_id != -1)
compat_id = compat__add_message(virtio-blk device was not 
detected,
While you have requested a 
virtio-blk device, 
-   the guest kernel didn't seem 
to detect it.\n
-   Please make sure that the 
kernel was compiled 
-   with CONFIG_VIRTIO_BLK.);
+   the guest kernel did not 
initialize it.\n
+   Please make sure that the 
guest kernel was 
+   compiled with 
CONFIG_VIRTIO_BLK=y enabled 
+   in its .config);
 }
 
 void virtio_blk__init_all(struct kvm *kvm)
diff --git a/tools/kvm/virtio/console.c b/tools/kvm/virtio/console.c
index ef1ef0d..36997bd 100644
--- a/tools/kvm/virtio/console.c
+++ b/tools/kvm/virtio/console.c
@@ -192,7 +192,8 @@ void virtio_console__init(struct kvm *kvm)
if (compat_id != -1)
compat_id = compat__add_message(virtio-console device was not 
detected,
While you have requested a 
virtio-console device, 
-   the guest kernel didn't seem 
to detect it.\n
-   Please make sure that the 
kernel was compiled 
-   with CONFIG_VIRTIO_CONSOLE.);
+   the guest kernel did not 
initialize it.\n
+   Please make sure that the 
guest kernel was 
+   compiled with 
CONFIG_VIRTIO_CONSOLE=y enabled 
+   in its .config);
 }
diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 9c39a87..bfa2c7b 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -416,7 +416,8 @@ void virtio_net__init(const struct virtio_net_params 
*params)
if (compat_id != -1)

[PATCH 4/7] kvm tools: Add basic help when starting 'kvm' without args

2011-10-13 Thread Sasha Levin
Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-help.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-help.c b/tools/kvm/builtin-help.c
index e70044a..344ff39 100644
--- a/tools/kvm/builtin-help.c
+++ b/tools/kvm/builtin-help.c
@@ -34,7 +34,8 @@ static void list_common_cmds_help(void)
 
 static void kvm_help(void)
 {
-   printf(\n usage: %s\n\n, kvm_usage_string);
+   printf(\n To start a simple non-privliged shell run 'kvm run'\n\n
+   usage: %s\n\n, kvm_usage_string);
list_common_cmds_help();
printf(\n %s\n\n, kvm_more_info_string);
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Sasha Levin
Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-setup.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
index 3e569e7..a2e893c 100644
--- a/tools/kvm/builtin-setup.c
+++ b/tools/kvm/builtin-setup.c
@@ -47,6 +47,8 @@ static void parse_setup_options(int argc, const char **argv)
 
 void kvm_setup_help(void)
 {
+   printf(\nkvm setup creates a new rootfs and stores it under 
~/.kvm-tools/ .\n
+   This can be used later by the '-d' parameter of 'kvm run'.\n);
usage_with_options(setup_usage, setup_options);
 }
 
@@ -205,10 +207,18 @@ int kvm_setup_create_new(const char *guestfs_name)
 
 int kvm_cmd_setup(int argc, const char **argv, const char *prefix)
 {
+   int r;
+
parse_setup_options(argc, argv);
 
if (instance_name == NULL)
kvm_setup_help();
 
-   return do_setup(instance_name);
+   r = do_setup(instance_name);
+   if (r == 0)
+   pr_info(Your new rootfs named %s has been created.\n
+   You can now start it by running 'kvm run -d %s'\n,
+   instance_name, instance_name);
+
+   return r;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] kvm tools: Modify 'kvm setup' parameters

2011-10-13 Thread Sasha Levin
This patch makes the name as the default parameter, this makes usability
easier since this is the only parameter available at the moment anyway.

Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-setup.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
index a2e893c..4410653 100644
--- a/tools/kvm/builtin-setup.c
+++ b/tools/kvm/builtin-setup.c
@@ -25,13 +25,11 @@
 static const char *instance_name;
 
 static const char * const setup_usage[] = {
-   kvm setup [-n name],
+   kvm setup [name],
NULL
 };
 
 static const struct option setup_options[] = {
-   OPT_GROUP(General options:),
-   OPT_STRING('n', name, instance_name, name, Instance name),
OPT_END()
 };
 
@@ -40,8 +38,12 @@ static void parse_setup_options(int argc, const char **argv)
while (argc != 0) {
argc = parse_options(argc, argv, setup_options, setup_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
-   if (argc != 0)
+   if (argc != 0  instance_name)
kvm_setup_help();
+   else
+   instance_name = argv[0];
+   argv++;
+   argc--;
}
 }
 
@@ -219,6 +221,8 @@ int kvm_cmd_setup(int argc, const char **argv, const char 
*prefix)
pr_info(Your new rootfs named %s has been created.\n
You can now start it by running 'kvm run -d %s'\n,
instance_name, instance_name);
+   else
+   perror(Error creating rootfs);
 
return r;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] kvm tools: Add option to list rootfs in 'kvm list'

2011-10-13 Thread Sasha Levin
Previously we would only list running instances, now that we create
rootfs instances it's appropriate to add a method to list them as
well.

Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/builtin-list.c |   68 -
 1 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-list.c b/tools/kvm/builtin-list.c
index fcf9bb0..43a0aa8 100644
--- a/tools/kvm/builtin-list.c
+++ b/tools/kvm/builtin-list.c
@@ -4,6 +4,7 @@
 #include kvm/kvm.h
 #include kvm/parse-options.h
 
+#include dirent.h
 #include stdio.h
 #include string.h
 #include signal.h
@@ -11,12 +12,18 @@
 
 #define PROCESS_NAME kvm
 
+static bool run;
+static bool rootfs;
+
 static const char * const list_usage[] = {
kvm list,
NULL
 };
 
 static const struct option list_options[] = {
+   OPT_GROUP(General options:),
+   OPT_BOOLEAN('i', run, run, List running instances),
+   OPT_BOOLEAN('r', rootfs, rootfs, List rootfs instances),
OPT_END()
 };
 
@@ -58,9 +65,66 @@ cleanup:
return 0;
 }
 
-int kvm_cmd_list(int argc, const char **argv, const char *prefix)
+static int kvm_list_running_instances(void)
 {
printf(  PID GUEST\n);
 
-   return kvm__enumerate_instances(print_guest);
+   return kvm__enumerate_instances(print_guest);
+}
+
+static int kvm_list_rootfs(void)
+{
+   char name[PATH_MAX];
+   DIR *dir;
+   struct dirent *dirent;
+
+   snprintf(name, PATH_MAX, %s%s, HOME_DIR, KVM_PID_FILE_PATH);
+   dir = opendir(name);
+   if (dir == NULL)
+   return -1;
+
+   printf(  ROOTFS\n);
+
+   while ((dirent = readdir(dir))) {
+   if (dirent-d_type == DT_DIR 
+   strcmp(dirent-d_name, .) 
+   strcmp(dirent-d_name, ..))
+   printf(%s\n, dirent-d_name);
+   }
+
+   return 0;
+}
+
+static void parse_setup_options(int argc, const char **argv)
+{
+   while (argc != 0) {
+   argc = parse_options(argc, argv, list_options, list_usage,
+   PARSE_OPT_STOP_AT_NON_OPTION);
+   if (argc != 0)
+   kvm_list_help();
+   }
+}
+
+int kvm_cmd_list(int argc, const char **argv, const char *prefix)
+{
+   int r;
+
+   parse_setup_options(argc, argv);
+
+   if (!run  !rootfs)
+   kvm_list_help();
+
+   if (run) {
+   r = kvm_list_running_instances();
+   if (r  0)
+   perror(Error listing instances);
+   }
+
+   if (rootfs) {
+   r = kvm_list_rootfs();
+   if (r  0)
+   perror(Error listing rootfs);
+   }
+
+   return 0;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 + r = do_setup(instance_name);
 + if (r == 0)
 + pr_info(Your new rootfs named %s has been created.\n
 + You can now start it by running 'kvm run -d %s'\n,
 + instance_name, instance_name);

Btw., will 'kvm run' select the last-created rootfs by default?

The patches look good otherwise.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Alexander Graf

On 10/13/2011 11:47 AM, Bharat Bhushan wrote:

The current implementation of mtmsr and mtmsrd are racy in that it does:

   * check (int_pending == 0)
   ---  host sets int_pending = 1---
   * write shared page
   * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushanbharat.bhus...@freescale.com


Thanks, applied to kvm-ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Sasha Levin
On Thu, 2011-10-13 at 12:00 +0200, Ingo Molnar wrote:
 * Sasha Levin levinsasha...@gmail.com wrote:
 
  +   r = do_setup(instance_name);
  +   if (r == 0)
  +   pr_info(Your new rootfs named %s has been created.\n
  +   You can now start it by running 'kvm run -d %s'\n,
  +   instance_name, instance_name);
 
 Btw., will 'kvm run' select the last-created rootfs by default?
 
 The patches look good otherwise.

A simple 'kvm run' (creates if needed) and starts a rootfs named
'default'.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V5 00/11] Paravirtualized ticketlocks

2011-10-13 Thread Peter Zijlstra
On Wed, 2011-10-12 at 17:51 -0700, Jeremy Fitzhardinge wrote:
 
 This is is all unnecessary complication if you're not using PV ticket
 locks, it also uses the jump-label machinery to use the standard
 add-based unlock in the non-PV case.
 
 if (TICKET_SLOWPATH_FLAG 
 unlikely(static_branch(paravirt_ticketlocks_enabled))) {
 arch_spinlock_t prev;
 
 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);
 
 /* add_smp() is a full mb() */
 
 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);
 } else
 __add(lock-tickets.head, TICKET_LOCK_INC, 
 UNLOCK_LOCK_PREFIX); 

Not that I mind the jump_label usage, but didn't paravirt have an
existing alternative() thingy to do things like this? Or is the
alternative() stuff not flexible enough to express this?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: virtio localdisk performance

2011-10-13 Thread Martin Maurer
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of benoit ROUSSELLE
 Sent: Donnerstag, 13. Oktober 2011 11:06
 To: kvm@vger.kernel.org
 Subject: Re: virtio localdisk performance
 
 Hello,
 
 Can someone running with virtio tell me if this is normal to have such a
 difference with dd between host and guest machine ?
 host dd is 500MB/s
 guest dd is 150MB/s

Hi,

I just run the same test on a Proxmox VE 2.0 beta (KVM 0.15) as host and also 
as guest: 
(hardware: Adaptec 5805 Z with 4 x WD 500 GB RE3 in raid10)

Host 155
Guest 120 (raw file on ext3)

Do you run with the virtual disk with cache=no?

Martin

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread benoit ROUSSELLE
As you can guess i am pretty new to KVM.
Here is what i have found:
/usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 2048 -smp
1,sockets=1,cores=1,threads=1 -name linbrotestNC -uuid
5cd17775-c826-7422-6634-428876225868 -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/linbrotestNC.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot nc -drive
file=/dev/vg/lin01-root,if=none,id=drive-virtio-disk0,boot=on,format=raw,cache=none
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-drive 
file=/dev/vg/lin01-swap,if=none,id=drive-virtio-disk1,format=raw,cache=none
-device 
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-device virtio-net-pci,vlan=0,id=net0,mac=52:54:00:5a:92:58,bus=pci.0,addr=0x3
-net tap,fd=47,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:1 -k en-us -vga cirrus
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
(got this info from ps, but is there a better place to find the kvm
command line ?. I created the vm with virt-manager, but if you tell it
is better to use virt-inst ... will do so.)

As you can see i use lvm block devices with virtio enabled.
I have read some doc about the cache=none and here my machine is using it.

Here is the definition of my disk in my machine xml file:
  devices
emulator/usr/bin/kvm/emulator
disk type='block' device='disk'
  driver name='qemu' type='raw' cache='none'/
  source dev='/dev/vg/lin01-root'/
  target dev='vda' bus='virtio'/
  address type='pci' domain='0x' bus='0x00' slot='0x04'
function='0x0'/
/disk


Thank you.

Benoit.

On Thu, Oct 13, 2011 at 11:19 AM, Michael Tokarev m...@tls.msk.ru wrote:
 On 13.10.2011 13:05, benoit ROUSSELLE wrote:
 Hello,

 Can someone running with virtio tell me if this is normal to have such
 a difference with dd between host and guest machine ?
 host dd is 500MB/s
 guest dd is 150MB/s

 All other questions are optionnal for now :D

 Well, one question is mandatory: what is your kvm command line?
 Without it, there's no way to answer any other questions.

 /mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread benoit ROUSSELLE
On Thu, Oct 13, 2011 at 1:20 PM, Martin Maurer mar...@proxmox.com wrote:
 Host 155
 Guest 120 (raw file on ext3)

 Do you run with the virtual disk with cache=no?

Yes i did so (see my previous post)
So i guess there must be something wrong in my setup. or may be i
should use a more recent version of kvm ?
qemu-kvm 0.12.5.

Thanks,
Benoit.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VM installing into filesystem or plain partition ?

2011-10-13 Thread Lentes, Bernd
Hi,

i want to use KVM for running several virtual machines. Some i like to migrate 
from VMWare, others i want to create new.
I red that installing VM's into plain partitions (without a filesystem) is 
faster than installing them into a partition with FS.
Is that true ? How big is the difference in speed ?

I want to create a fileserver in a vm. Currently i don't know if i will use 
Windows 2208 Server or a linux distro with samba.
I'd like to install the vm into logical volumes using LVM. Host OS will be SLES 
11. A fileserver needs a backup of the user data.
For this purpose i'd like to create serveral snapshots of the LV the fileserver 
resides on.
I think a snapshot of a plain LV, without fs (if i choose to install the vm 
into a plain lv), doesn't make sense. Or does this work ?
So i think i need a fs in the lv for the fileserver. Is the loss of speed 
severe ?


Bernd


--
Bernd Lentes

Systemadministration
Institut für Entwicklungsgenetik
HelmholtzZentrum münchen
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
fax:   +49 89 3187 3826
http://www.helmholtz-muenchen.de/idg

Wir sollten nicht den Tod fürchten, sondern das schlechte Leben.


Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread Stefan Hajnoczi
On Wed, Oct 12, 2011 at 1:21 PM, benoit ROUSSELLE brousse...@gmail.com wrote:
 I am doing some disk io performance testing on the following environment:
 Dell AMD R515 with:
 - debian6 (2.6.32) on host and guests.
 - raid1 mirror on a perc h700
 - lvm used to create virtual disks volumes
 - virtio enabled per default on 2.6.32 and used for nic and disk drivers

 For my tests i use the following command:
 dd bs=6M count=500 if=/dev/zero of=toto.txt

 On local host i get 500MB/s
 On 1 guest 150MB/s
 On 2 guests in parallel 70MB/s

dd performs buffered I/O by default.  That means it just writes to the
page cache and the kernel decides when to write out dirty pages.

So your host probably has a bunch more RAM than the guest - dd
write(2) calls are simply dirtying memory.  Your guest has less RAM
and needs to do actual block I/O.  That's why the results are so
different.

You are not measuring virtio-blk performance here.  Use dd
oflag=direct to bypass the page cache and actually do block I/O.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-13 Thread Marcelo Tosatti
On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
 To forward an interrupt to a vcpu that runs on
 a host cpu different from the current one,
 we need an ipi which likely will cost us as much
 as delivering the interrupt directly to that cpu would.
 
 Set irq affinity hint to point there, irq balancer
 can then take this into accound and balance
 interrupts accordingly.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  virt/kvm/assigned-dev.c |8 +---
  virt/kvm/irq_comm.c |   17 -
  2 files changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index f89f138..b579777 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
   for (i = 0; i  assigned_dev-entries_nr; i++)
   disable_irq(assigned_dev-host_msix_entries[i].vector);
  
 - for (i = 0; i  assigned_dev-entries_nr; i++)
 - free_irq(assigned_dev-host_msix_entries[i].vector,
 -  (void *)assigned_dev);
 + for (i = 0; i  assigned_dev-entries_nr; i++) {
 + u32 vector = assigned_dev-host_msix_entries[i].vector;
 + irq_set_affinity_hint(vector, NULL);
 + free_irq(vector, (void *)assigned_dev);
 + }
  
   assigned_dev-entries_nr = 0;
   kfree(assigned_dev-host_msix_entries);
 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index ac8b629..68b1f7c 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -22,6 +22,7 @@
  
  #include linux/kvm_host.h
  #include linux/slab.h
 +#include linux/interrupt.h
  #include trace/events/kvm.h
  
  #include asm/msidef.h
 @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
 kvm_lapic_irq *irq)
  #endif
  }
  
 +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
 +{
 + const struct cpumask *mask;
 + /* raw_smp_processor_id() is ok here: if we get preempted we can get a
 +  * wrong value but we don't mind much. */
 + if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
 + mask = get_cpu_mask(vcpu-cpu);
 + irq_set_affinity_hint(host_irq, mask);
 + }
 +}

Unsure about the internals of irq_set_affinity_hint, but AFAICS its
exported so that irqbalance in userspace can make a decision.

If that is the case, then irqbalance update rate should be high enough
to catch up with a vcpu migrating betweens cpus (which initially does
not appear a sensible arrangement).

The decision to have the host interrupt follow the vcpu seems a good
one, given that it saves an IPI and is potentially more cache friendly
overall.

And AFAICS its more intelligent for the device assignment case than
anything irqbalance can come up with (note it depends on how the APIC is
configured, your patch ignores that).

 + 
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
   struct kvm_lapic_irq *irq, int host_irq)
  {
 @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   if (r  0)
   r = 0;
   r += kvm_apic_set_irq(vcpu, irq);
 + kvm_vcpu_host_irq_hint(vcpu, host_irq);
   } else if (kvm_lapic_enabled(vcpu)) {
   if (!lowest)
   lowest = vcpu;
 @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   }
   }
  
 - if (lowest)
 + if (lowest) {
   r = kvm_apic_set_irq(lowest, irq);
 + kvm_vcpu_host_irq_hint(vcpu, host_irq);
 + }
  
   return r;
  }
 -- 
 1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio localdisk performance

2011-10-13 Thread benoit ROUSSELLE
On Thu, Oct 13, 2011 at 4:27 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 dd performs buffered I/O by default.  That means it just writes to the
 page cache and the kernel decides when to write out dirty pages.

 So your host probably has a bunch more RAM than the guest - dd
 write(2) calls are simply dirtying memory.  Your guest has less RAM
 and needs to do actual block I/O.  That's why the results are so
 different.

 You are not measuring virtio-blk performance here.  Use dd
 oflag=direct to bypass the page cache and actually do block I/O.

You are right the change is impressive, but still there is a big difference:
dd oflag=direct bs=6M count=1000 if=/dev/zero of=titi.txt

i get on the host:
6291456000 bytes (6.3 GB) copied, 29.8403 s, 211 MB/s
and in the vm:
6291456000 bytes (6.3 GB) copied, 51.3302 s, 123 MB/s

Thanks,
Benoit.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH] kvm: support TSC deadline MSR with subsection

2011-10-13 Thread Marcelo Tosatti
On Wed, Oct 12, 2011 at 12:26:12PM +0800, Liu, Jinsong wrote:
 Marcelo,
 
 I just test guest migration from v13 to v12, it failed w/ info
 savevm: unsupported version 13 for 'cpu' v12
 load of migration failed
 
 v13 is new qemu-kvm with tsc deadline timer co-work patch, v12 is old 
 qemu-kvm.

You should try the patch in the first message in this thread, which is a
replacement for the original tsc deadline timer patch.
 
 Marcelo Tosatti wrote:
  Jinsong, please test this qemu-kvm patch by migrating a guest which is
  currently using TSC deadline timer. Using subsections avoids breaking
  migration to older qemu versions when the guest does not make use of
  TSC deadline feature.
  
 
 Is subsection used to avoid breaking migration to older qemu?

Yes.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Pekka Enberg

On Thu, 13 Oct 2011, Ingo Molnar wrote:

* Sasha Levin levinsasha...@gmail.com wrote:


+   r = do_setup(instance_name);
+   if (r == 0)
+   pr_info(Your new rootfs named %s has been created.\n
+   You can now start it by running 'kvm run -d %s'\n,
+   instance_name, instance_name);


Btw., will 'kvm run' select the last-created rootfs by default?


No, it runs rootfs that's named 'default'. We can change that but I'm not 
completely convinced running that last-created rootfs is the right thing 
to do here. Hmm.



The patches look good otherwise.


Thanks for the review!

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] kvm tools: Add option to list rootfs in 'kvm list'

2011-10-13 Thread Pekka Enberg

On Thu, 13 Oct 2011, Sasha Levin wrote:

Previously we would only list running instances, now that we create
rootfs instances it's appropriate to add a method to list them as
well.

Suggested-by: Ingo Molnar mi...@elte.hu
Signed-off-by: Sasha Levin levinsasha...@gmail.com


I applied the patch but we really ought to print out both of them by 
default. Something like this:


 PIDGUEST
1234debian
not running   default

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Ingo Molnar

* Pekka Enberg penb...@cs.helsinki.fi wrote:

 On Thu, 13 Oct 2011, Ingo Molnar wrote:
 * Sasha Levin levinsasha...@gmail.com wrote:
 
 +   r = do_setup(instance_name);
 +   if (r == 0)
 +   pr_info(Your new rootfs named %s has been created.\n
 +   You can now start it by running 'kvm run -d %s'\n,
 +   instance_name, instance_name);
 
 Btw., will 'kvm run' select the last-created rootfs by default?
 
 No, it runs rootfs that's named 'default'. We can change that but 
 I'm not completely convinced running that last-created rootfs is 
 the right thing to do here. Hmm.

I'm not convinced either - just wanted to raise the issue.

Also, i raised this in the other thread, why not use .kvmtool of the 
current directory? That way it's cwd local like Git and perf. A 'kvm 
run' (union?) mount the cwd or so - so this would be a natural 
equivalent to chroot. The $HOME/.kvmtool is a Qemu-ish global 
workflow.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 06/11] virt: Introducing libvirt monitor

2011-10-13 Thread Lucas Meneghel Rodrigues
On Wed, Oct 12, 2011 at 4:48 AM, Amos Kong ak...@redhat.com wrote:
 On 10/12/2011 05:07 AM, Lucas Meneghel Rodrigues wrote:
 This is an initial implementation for a libvirt monitor.
 With it, we plan on making the libvirt test use all the
 monitor features, making most of the tests available for
 kvm available for libvirt.

 As of implementation details, it uses aexpect to get a
 virsh shell, and then the monitor methods are implemented
 by executing commands on that virsh shell.

 As of now, the libvirt vm class is still not using the
 monitor code, but we plan on making the move soon enough.

 Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
 ---
   client/virt/libvirt_monitor.py |  322 
 
   1 files changed, 322 insertions(+), 0 deletions(-)
   create mode 100644 client/virt/libvirt_monitor.py

 diff --git a/client/virt/libvirt_monitor.py b/client/virt/libvirt_monitor.py
 new file mode 100644
 index 000..05b838c
 --- /dev/null
 +++ b/client/virt/libvirt_monitor.py
 @@ -0,0 +1,322 @@
 +import re, tempfile, xml.dom.minidom, logging
 +import virt_utils, aexpect
 +from autotest_lib.client.bin import utils
 +
 +
 +class VirshMonitor:
 +    
 +    Wraps Virsh monitor commands.
 +    
 +
 +    def __init__(self, virsh_exec='virsh', name, vmname, password=None,
 +                 prompt=None, hostname='localhost', driver=None, 
 username=None,
 +                 linesep=\\n):
 +        
 +        Connect to the hypervisor and get virsh prompt.
 +
 +        @param virsh_exec: Virsh executable
 +        @param name: Monitor identifier (a string)
 +        @param vmname: VM name
 +        @param password: Hypervisor user password
 +        @param prompt: Virsh prompt
 +        @param hostname: Hypervisor IP
 +        @param driver: Hypervisor driver type
 +        @param username: Hypervisor  username
 +        @param linesep: The line separator to use when sending lines
 +                (e.g. '\\n' or '\\r\\n')
 +        
 +        self.virsh_exec = virsh_exec
 +        self.name = name
 +        self.vmname = vmname
 +        self.password = password
 +        self.prompt = prompt
 +        self.hostname = hostname
 +        self.driver = driver
 +        self.username = username
 +        self.session = self.login()
 +        self.virsh_cmd = {help:help, quit:destroy  + self.vmname,
 +                           stop:suspend, cont:resume}
 +        self.drive_map = {}
 +        self.network_info = []
 +        self.disk_info = []
 +        self._parse_domxml()
 +
 +
 +    def __del__(self):
 +        self.session.sendline(quit)
 +

 
 +        if balloon in command:
 +            new_mem = re.findall(balloon\s+(\d+), command)[0]
 +            new_mem = str(int(new_mem) * 1024)
 +            output = self.session.cmd_output(setmem  %s %s %
 +                                                      (self.vmname, 
 new_mem))
 +            return
 +
 +        if system_reset in command:
 +            self.session.cmd_output(destroy %s % self.vmname)
 +            self.session.cmd_output(start %s % self.vmname)
 +            return

 This would make qemu process exit, this is not same as qemu monitor
 cmd(system_reset). We may migrate guest which is repeatedly rebooting,
 then migration will be failed.

Ok, noted, but since there's no such an API written for it yet, and
since we are still not supporting migration tests, I believe it's OK
to leave it like that for now.

 # grep system_reset virt/tests/*
 virt/tests/boot.py:    2) Send a reboot command or a system_reset
 monitor command (optional)
 virt/tests/boot.py:        if params[reboot_method] == system_reset:
 Binary file virt/tests/boot.pyc matches
 virt/tests/iofuzz.py:                        session =
 vm.reboot(method=system_reset)


 'system_reset' of qemu monitor is only called for fakereboot in Libvirt.
 but Libvirt developer told me they may add new API for it.


 +        data = self.session.cmd_output( %s \n % self.virsh_cmd.get(
 +                                                            command, 
 command))
 +        return data
 +
 +
 +    def is_responsive(self):
 +        
 +        Return True if the monitor is responsive.
 +        
 +        return True
 +
 +
 +    def quit(self):
 +        
 +        Send quit without waiting for output.
 +        
 +        self.cmd(quit)
 +
 +
 +    def screendump(self, filename, debug=True):
 +        
 +        Request a screendump.
 +
 +        @param filename: Location for the screendump
 +        @return: The command's output
                     

 +        
 +        if debug:
 +            logging.debug(Requesting screendump %s % filename)
 +        self.cmd(screenshot %s % filename)

            cmd output is not returned.

Ok, that was easy enough to fix, thanks! I've updated my branch with
all the fixes pointed out so far.


 +    def info(self, what):
 +        
 +        Request info about something and return the output.
 +      

Re: [PATCH 05/11] virt: Introducing libvirt VM class

2011-10-13 Thread Lucas Meneghel Rodrigues

On 10/12/2011 05:14 AM, Daniel P. Berrange wrote:

On Tue, Oct 11, 2011 at 06:07:11PM -0300, Lucas Meneghel Rodrigues wrote:

This is a first attempt at providing a libvirt VM class,
in order to implement the needed methods for virt testing.
With this class, we will be able to implement a libvirt
test, that behaves similarly to the KVM test.

As of implementation details, libvirt_vm uses virsh
(a userspace program written on top of libvirt) to
do domain start, stop, verification of status and
other common operations. The reason why virsh was
used is to get more coverage of the userspace stack
that libvirt offers, and also to catch issues that
virsh users would catch.


Personally I would have recommended that you use the libvirt Python API.
virsh is a very thin layer over the libvirt API, which mostly avoidse
adding any logic of its own, so once it has been tested once, there's
not much value in doing more. By using the Python API directly, you will
be able todo more intelligent handling of errors, since you'll get the
full libvirt python exception object instead of a blob of stuff on stderr.
Not to mention that it is so much more efficient, and robust against
any future changes in virsh.


Thanks Daniel. We've discussed about that quite a few times during the 
development of the code. The plan is to just add methods that will do 
the same as the API counterpart moving forward, so we can test both the 
think layer that is virsh and the py API. We just wanted to focusing in 
one implementation to get something functional first.


We'll keep you guys posted about progress on py bindings.

Cheers,

Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [QEMU PATCH] kvm: support TSC deadline MSR with subsection

2011-10-13 Thread Liu, Jinsong
Marcelo Tosatti wrote:
 On Wed, Oct 12, 2011 at 12:26:12PM +0800, Liu, Jinsong wrote:
 Marcelo,
 
 I just test guest migration from v13 to v12, it failed w/ info
 savevm: unsupported version 13 for 'cpu' v12
 load of migration failed
 
 v13 is new qemu-kvm with tsc deadline timer co-work patch, v12 is
 old qemu-kvm. 
 
 You should try the patch in the first message in this thread, which
 is a replacement for the original tsc deadline timer patch.

Sorry, I didn't notice your modification.
Just test the modified version, it worked OK when migrate from new qemu (w/ tsc 
deadline timer patch) to old qemu.

Thanks,
Jinsong

=
From: Liu, Jinsong jinsong@intel.com

KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.

Use subsections to save/restore the field (mtosatti).

Signed-off-by: Liu, Jinsong jinsong@intel.com

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index ae36489..29412dc 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -283,6 +283,7 @@
 #define MSR_IA32_APICBASE_BSP   (18)
 #define MSR_IA32_APICBASE_ENABLE(111)
 #define MSR_IA32_APICBASE_BASE  (0xf12)
+#define MSR_IA32_TSCDEADLINE0x6e0
 
 #define MSR_MTRRcap0xfe
 #define MSR_MTRRcap_VCNT   8
@@ -687,6 +688,7 @@ typedef struct CPUX86State {
 uint64_t async_pf_en_msr;
 
 uint64_t tsc;
+uint64_t tsc_deadline;
 
 uint64_t mcg_status;
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index b6eef04..90a6ffb 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
+static bool has_msr_tsc_deadline;
 static bool has_msr_async_pf_en;
 static int lm_capable_kernel;
 
@@ -568,6 +569,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hsave_pa = true;
 continue;
 }
+if (kvm_msr_list-indices[i] == MSR_IA32_TSCDEADLINE) {
+has_msr_tsc_deadline = true;
+continue;
+}
 }
 }
 
@@ -881,6 +886,9 @@ static int kvm_put_msrs(CPUState *env, int level)
 if (has_msr_hsave_pa) {
 kvm_msr_entry_set(msrs[n++], MSR_VM_HSAVE_PA, env-vm_hsave);
 }
+if (has_msr_tsc_deadline) {
+kvm_msr_entry_set(msrs[n++], MSR_IA32_TSCDEADLINE, env-tsc_deadline);
+}
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 kvm_msr_entry_set(msrs[n++], MSR_CSTAR, env-cstar);
@@ -1127,6 +1135,9 @@ static int kvm_get_msrs(CPUState *env)
 if (has_msr_hsave_pa) {
 msrs[n++].index = MSR_VM_HSAVE_PA;
 }
+if (has_msr_tsc_deadline) {
+msrs[n++].index = MSR_IA32_TSCDEADLINE;
+}
 
 if (!env-tsc_valid) {
 msrs[n++].index = MSR_IA32_TSC;
@@ -1195,6 +1206,9 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_IA32_TSC:
 env-tsc = msrs[i].data;
 break;
+case MSR_IA32_TSCDEADLINE:
+env-tsc_deadline = msrs[i].data;
+break;
 case MSR_VM_HSAVE_PA:
 env-vm_hsave = msrs[i].data;
 break;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 9aca8e0..176d372 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -310,6 +310,24 @@ static const VMStateDescription vmstate_fpop_ip_dp = {
 }
 };
 
+static bool tscdeadline_needed(void *opaque)
+{
+CPUState *env = opaque;
+
+return env-tsc_deadline != 0;
+}
+
+static const VMStateDescription vmstate_msr_tscdeadline = {
+.name = cpu/msr_tscdeadline,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT64(tsc_deadline, CPUState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_cpu = {
 .name = cpu,
 .version_id = CPU_SAVE_VERSION,
@@ -420,6 +438,9 @@ static const VMStateDescription vmstate_cpu = {
 } , {
 .vmsd = vmstate_fpop_ip_dp,
 .needed = fpop_ip_dp_needed,
+}, {
+.vmsd = vmstate_msr_tscdeadline,
+.needed = tscdeadline_needed,
 } , {
 /* empty */
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V5 00/11] Paravirtualized ticketlocks

2011-10-13 Thread Jeremy Fitzhardinge
On 10/13/2011 03:54 AM, Peter Zijlstra wrote:
 On Wed, 2011-10-12 at 17:51 -0700, Jeremy Fitzhardinge wrote:
 This is is all unnecessary complication if you're not using PV ticket
 locks, it also uses the jump-label machinery to use the standard
 add-based unlock in the non-PV case.

 if (TICKET_SLOWPATH_FLAG 
 unlikely(static_branch(paravirt_ticketlocks_enabled))) {
 arch_spinlock_t prev;

 prev = *lock;
 add_smp(lock-tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);
 } else
 __add(lock-tickets.head, TICKET_LOCK_INC, 
 UNLOCK_LOCK_PREFIX); 
 Not that I mind the jump_label usage, but didn't paravirt have an
 existing alternative() thingy to do things like this? Or is the
 alternative() stuff not flexible enough to express this?

Yeah, that's a good question.  There are three mechanisms with somewhat
overlapping concerns:

  * alternative()
  * pvops patching
  * jump_labels

Alternative() is for low-level instruction substitution, and really only
makes sense at the assembler level with one or two instructions.

pvops is basically a collection of ordinary _ops structures full of
function pointers, but it has a layer of patching to help optimise it. 
In the common case, this just replaces an indirect call with a direct
one, but in some special cases it can inline code.  This is used for
small, extremely performance-critical things like cli/sti, but it
awkward to use in general because you have to specify the inlined code
as a parameterless asm.

Jump_labels is basically an efficient way of doing conditionals
predicated on rarely-changed booleans - so it's similar to pvops in that
it is effectively a very ordinary C construct optimised by dynamic code
patching.


So for _arch_spin_unlock(), what I'm trying to go for is that if you're
not using PV ticketlocks, then the unlock sequence is unchanged from
normal.  But also, even if you are using PV ticketlocks, I want the
fastpath to be inlined, with the call out to a special function only
happening on the slow path.  So the result is that if().  If the
static_branch is false, then the executed code sequence is:

nop5
addb $2, (lock)
ret

which is pretty much ideal.  If the static_branch is true, then it ends
up being:

jmp5 1f
[...]

1:  lock add $2, (lock)
test $1, (lock.tail)
jne slowpath
ret
slowpath:...

which is also pretty good, given all the other constraints.

While I could try use inline patching to get a simply add for the non-PV
unlock case (it would be awkward without asm parameters), but I wouldn't
be able to also get the PV unlock fastpath code to be (near) inline. 
Hence jump_label.

Thanks,
J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Pekka Enberg
On Thu, Oct 13, 2011 at 7:52 PM, Ingo Molnar mi...@elte.hu wrote:
 Btw., will 'kvm run' select the last-created rootfs by default?

 No, it runs rootfs that's named 'default'. We can change that but
 I'm not completely convinced running that last-created rootfs is
 the right thing to do here. Hmm.

 I'm not convinced either - just wanted to raise the issue.

Right. So 'kvm run' is supposed to setup and launch a 'default' rootfs
if no rootfs is specified.

 Also, i raised this in the other thread, why not use .kvmtool of the
 current directory? That way it's cwd local like Git and perf. A 'kvm
 run' (union?) mount the cwd or so - so this would be a natural
 equivalent to chroot. The $HOME/.kvmtool is a Qemu-ish global
 workflow.

Yeah, that definitely makes sense. 'kvm setup rootfs' wouldn't create
rootfs under $HOME/.kvmtool/rootfs but under $(PWD)/rootfs.

I guess we'll lose the ability to 'kvm list' all available rootfs
directories, though?

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V2] qemu-kvm: fix improper nmi emulation

2011-10-13 Thread Lai Jiangshan

 
 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.
 

Only the thread of the VCPU can safely get the in-kernel LAPIC states,
so this approach will cause some troubles.

Since it is a kernel bug, I think we need to change the kernel.

Thanks,
lai
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1 V3] kernel/kvm: fix improper nmi emulation

2011-10-13 Thread Lai Jiangshan
From: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, KVM_NMI ioctl is handled as follows.

- When in-kernel irqchip is enabled, KVM_NMI ioctl is handled as a
  request of triggering LINT1 on the processor. LINT1 is emulated in
  in-kernel irqchip.

- When in-kernel irqchip is disabled, KVM_NMI ioctl is handled as a
  request of injecting NMI to the processor. This assumes LINT1 is
  already emulated in userland.

(laijs) Add KVM_NMI API document

Signed-off-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
Tested-by: Lai Jiangshan la...@cn.fujitsu.com
---
 Documentation/virtual/kvm/api.txt |   18 ++
 arch/x86/kvm/irq.h|1 +
 arch/x86/kvm/lapic.c  |7 +++
 arch/x86/kvm/x86.c|5 -
 4 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b0e4b9c..3162fc8 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1430,6 +1430,24 @@ is supported; 2 if the processor requires all virtual 
machines to have
 an RMA, or 1 if the processor can use an RMA but doesn't require it,
 because it supports the Virtual RMA (VRMA) facility.
 
+4.64 KVM_NMI
+
+Capability: KVM_CAP_USER_NMI
+Architectures: x86
+Type: vcpu ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+
+This ioctl injects NMI to the vcpu:
+
+   - When in-kernel irqchip is enabled, KVM_NMI ioctl is handled as a
+ request of triggering LINT1 on the processor. LINT1 is emulated in
+ in-kernel lapic irqchip.
+
+   - When in-kernel irqchip is disabled, KVM_NMI ioctl is handled as a
+ request of injecting NMI to the processor. This assumes LINT1 is
+ already emulated in userland lapic.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 53e2d08..0c96315 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -95,6 +95,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu);
 void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
 void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 57dcbd4..87fe36a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1039,6 +1039,13 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+void kvm_apic_lint1_deliver(struct kvm_vcpu *vcpu)
+{
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   kvm_apic_local_deliver(apic, APIC_LVT1);
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84a28ea..615e6a7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2731,7 +2731,10 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu 
*vcpu,
 
 static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
 {
-   kvm_inject_nmi(vcpu);
+   if (irqchip_in_kernel(vcpu-kvm))
+   kvm_apic_lint1_deliver(vcpu);
+   else
+   kvm_inject_nmi(vcpu);
 
return 0;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1 V3] qemu-kvm: fix improper nmi emulation

2011-10-13 Thread Lai Jiangshan
From: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, inject LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, send nmi event to kernel as the
  current code does. LINT1 should be emulated in kernel.

Signed-off-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
Tested-by: Lai Jiangshan la...@cn.fujitsu.com
---
 hw/apic.c |   11 +++
 hw/apic.h |1 +
 monitor.c |6 +-
 3 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..e4addbd 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,17 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+if (kvm_enabled()  kvm_irqchip_in_kernel()) {
+cpu_interrupt(s-cpu_env, CPU_INTERRUPT_NMI);
+} else {
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env-apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env-apic_state);
+}
 }
 
 return 0;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: Add help and info messages to 'kvm setup'

2011-10-13 Thread Asias He
On 10/14/2011 02:22 AM, Pekka Enberg wrote:
 On Thu, Oct 13, 2011 at 7:52 PM, Ingo Molnar mi...@elte.hu wrote:
 Btw., will 'kvm run' select the last-created rootfs by default?

 No, it runs rootfs that's named 'default'. We can change that but
 I'm not completely convinced running that last-created rootfs is
 the right thing to do here. Hmm.

 I'm not convinced either - just wanted to raise the issue.
 
 Right. So 'kvm run' is supposed to setup and launch a 'default' rootfs
 if no rootfs is specified.
 
 Also, i raised this in the other thread, why not use .kvmtool of the
 current directory? That way it's cwd local like Git and perf. A 'kvm
 run' (union?) mount the cwd or so - so this would be a natural
 equivalent to chroot. The $HOME/.kvmtool is a Qemu-ish global
 workflow.
 
 Yeah, that definitely makes sense. 'kvm setup rootfs' wouldn't create
 rootfs under $HOME/.kvmtool/rootfs but under $(PWD)/rootfs.

I also think $(PWD)/rootfs is much better than $HOME/.kvmtool/rootfs.
I think 'kvm setup $absolute_rootfs_path' make sense as well.

 I guess we'll lose the ability to 'kvm list' all available rootfs
 directories, though?

User can create and put the rootfs directories where he/she likes. We do
not need to track where the rootfs directories is, thus we do not need
to 'kvm list' them.

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/2 V2] qemu-kvm: fix improper nmi emulation

2011-10-13 Thread Jan Kiszka
On 2011-10-14 02:53, Lai Jiangshan wrote:
 

 As explained in some other mail, we could then emulate the missing
 kernel feature by reading out the current in-kernel APIC state, testing
 if LINT1 is unmasked, and then delivering the NMI directly.

 
 Only the thread of the VCPU can safely get the in-kernel LAPIC states,
 so this approach will cause some troubles.

run_on_cpu() can help.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Alexander Graf

Am 13.10.2011 um 07:40 schrieb Bharat Bhushan r65...@freescale.com:

 The current implementation of mtmsr and mtmsrd are racy in that it does:
 
  * check (int_pending == 0)
  --- host sets int_pending = 1 ---
  * write shared page
  * done
 
 while instead we should check for int_pending after the shared page is 
 written.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/kernel/kvm_emul.S |   22 ++
 1 files changed, 10 insertions(+), 12 deletions(-)
 
 diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
 index f2b1b25..65f853b 100644
 --- a/arch/powerpc/kernel/kvm_emul.S
 +++ b/arch/powerpc/kernel/kvm_emul.S
 @@ -85,15 +85,15 @@ kvm_emulate_mtmsrd_reg:
/* Put MSR back into magic page */
STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
 +/* Check if we may trigger an interrupt */
 +andi.r30, r30, MSR_EE
 +beqno_check
 +
/* Check if we have to fetch an interrupt */
lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
cmpwir31, 0
beq+no_check
 
 -/* Check if we may trigger an interrupt */
 -andi.r30, r30, MSR_EE
 -beqno_check
 -

This chunk should actually be ok already. We're basically doing:

  if(likely(!int_pending)  !(new_msr  MSR_EE))
goto no_check;

Since we wrote shared.msr before the check, we're good, no?

SCRATCH_RESTORE
 
/* Nag hypervisor */
 @@ -167,22 +167,20 @@ maybe_stay_in_guest:
 kvm_emulate_mtmsr_reg2:
orir30, r0, 0
 
 -/* Check if we have to fetch an interrupt */
 -lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
 -cmpwir31, 0
 -beq+no_mtmsr
 +/* Put MSR into magic page because we don't call mtmsr */
 +STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
/* Check if we may trigger an interrupt */
andi.r31, r30, MSR_EE
beqno_mtmsr
 
 -bdo_mtmsr
 +/* Check if we have to fetch an interrupt */
 +lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
 +cmpwir31, 0
 +bne-do_mtmsr
 
 no_mtmsr:
 
 -/* Put MSR into magic page because we don't call mtmsr */
 -STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 -

This one looks good.

Alex

SCRATCH_RESTORE
 
/* Go back to caller */
 -- 
 1.7.0.4
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, October 13, 2011 2:36 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; bharatb.ya...@gmail.com; Bhushan Bharat-
 R65777
 Subject: Re: [PATCH] PPC: Fix race in mtmsr paravirt implementation
 
 
 Am 13.10.2011 um 07:40 schrieb Bharat Bhushan r65...@freescale.com:
 
  The current implementation of mtmsr and mtmsrd are racy in that it
 does:
 
   * check (int_pending == 0)
   --- host sets int_pending = 1 ---
   * write shared page
   * done
 
  while instead we should check for int_pending after the shared page is
 written.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/kernel/kvm_emul.S |   22 ++
  1 files changed, 10 insertions(+), 12 deletions(-)
 
  diff --git a/arch/powerpc/kernel/kvm_emul.S
  b/arch/powerpc/kernel/kvm_emul.S index f2b1b25..65f853b 100644
  --- a/arch/powerpc/kernel/kvm_emul.S
  +++ b/arch/powerpc/kernel/kvm_emul.S
  @@ -85,15 +85,15 @@ kvm_emulate_mtmsrd_reg:
 /* Put MSR back into magic page */
 STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
  +/* Check if we may trigger an interrupt */
  +andi.r30, r30, MSR_EE
  +beqno_check
  +
 /* Check if we have to fetch an interrupt */
 lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
 cmpwir31, 0
 beq+no_check
 
  -/* Check if we may trigger an interrupt */
  -andi.r30, r30, MSR_EE
  -beqno_check
  -
 
 This chunk should actually be ok already. We're basically doing:
 
   if(likely(!int_pending)  !(new_msr  MSR_EE))
 goto no_check;
 
 Since we wrote shared.msr before the check, we're good, no?

Actually I borrowed this from wrtee implementation and keep consistant across 
mtmsr/ mtmsrd and wrtee/i.
I thought of it like: If we are qualified to take interrupt (EE set) then only 
check for pending interrupt rather than check for pending interrupt and then 
check whether we are qualified to take it or not.

If you think earlier is better way of understanding then I will change the 
patch.

Thanks
-Bharat

 
 SCRATCH_RESTORE
 
 /* Nag hypervisor */
  @@ -167,22 +167,20 @@ maybe_stay_in_guest:
  kvm_emulate_mtmsr_reg2:
 orir30, r0, 0
 
  -/* Check if we have to fetch an interrupt */
  -lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
  -cmpwir31, 0
  -beq+no_mtmsr
  +/* Put MSR into magic page because we don't call mtmsr */
  +STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
 
 /* Check if we may trigger an interrupt */
 andi.r31, r30, MSR_EE
 beqno_mtmsr
 
  -bdo_mtmsr
  +/* Check if we have to fetch an interrupt */
  +lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
  +cmpwir31, 0
  +bne-do_mtmsr
 
  no_mtmsr:
 
  -/* Put MSR into magic page because we don't call mtmsr */
  -STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
  -
 
 This one looks good.
 
 Alex
 
 SCRATCH_RESTORE
 
 /* Go back to caller */
  --
  1.7.0.4
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Alexander Graf

On 10/13/2011 11:22 AM, Bhushan Bharat-R65777 wrote:



-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, October 13, 2011 2:36 PM
To: Bhushan Bharat-R65777
Cc:kvm-ppc@vger.kernel.org;bharatb.ya...@gmail.com; Bhushan Bharat-
R65777
Subject: Re: [PATCH] PPC: Fix race in mtmsr paravirt implementation


Am 13.10.2011 um 07:40 schrieb Bharat Bhushanr65...@freescale.com:


The current implementation of mtmsr and mtmsrd are racy in that it

does:

  * check (int_pending == 0)
  ---  host sets int_pending = 1---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is

written.

Signed-off-by: Bharat Bhushanbharat.bhus...@freescale.com
---
arch/powerpc/kernel/kvm_emul.S |   22 ++
1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/kvm_emul.S
b/arch/powerpc/kernel/kvm_emul.S index f2b1b25..65f853b 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -85,15 +85,15 @@ kvm_emulate_mtmsrd_reg:
/* Put MSR back into magic page */
STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)

+/* Check if we may trigger an interrupt */
+andi.r30, r30, MSR_EE
+beqno_check
+
/* Check if we have to fetch an interrupt */
lwzr31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
cmpwir31, 0
beq+no_check

-/* Check if we may trigger an interrupt */
-andi.r30, r30, MSR_EE
-beqno_check
-

This chunk should actually be ok already. We're basically doing:

   if(likely(!int_pending)  !(new_msr  MSR_EE))
 goto no_check;

Since we wrote shared.msr before the check, we're good, no?

Actually I borrowed this from wrtee implementation and keep consistant across 
mtmsr/ mtmsrd and wrtee/i.
I thought of it like: If we are qualified to take interrupt (EE set) then only 
check for pending interrupt rather than check for pending interrupt and then 
check whether we are qualified to take it or not.

If you think earlier is better way of understanding then I will change the 
patch.


The question is which one is the more likely case. I would assume it's 
more unlikely for an interrupt to be active than for EE to be on. But 
the real point is that it shouldn't matter, so there's no need to change 
the code :).



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Bharat Bhushan
The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  --- host sets int_pending = 1 ---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v2:
 - msr.ee and int_pending are checked as earlier.

 arch/powerpc/kernel/kvm_emul.S |   10 --
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index f2b1b25..3d64c57 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -167,6 +167,9 @@ maybe_stay_in_guest:
 kvm_emulate_mtmsr_reg2:
ori r30, r0, 0
 
+   /* Put MSR into magic page because we don't call mtmsr */
+   STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
/* Check if we have to fetch an interrupt */
lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
cmpwi   r31, 0
@@ -174,15 +177,10 @@ kvm_emulate_mtmsr_reg2:
 
/* Check if we may trigger an interrupt */
andi.   r31, r30, MSR_EE
-   beq no_mtmsr
-
-   b   do_mtmsr
+   bne do_mtmsr
 
 no_mtmsr:
 
-   /* Put MSR into magic page because we don't call mtmsr */
-   STL64(r30, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
-
SCRATCH_RESTORE
 
/* Go back to caller */
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] PPC: Fix race in mtmsr paravirt implementation

2011-10-13 Thread Alexander Graf

On 10/13/2011 11:47 AM, Bharat Bhushan wrote:

The current implementation of mtmsr and mtmsrd are racy in that it does:

   * check (int_pending == 0)
   ---  host sets int_pending = 1---
   * write shared page
   * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushanbharat.bhus...@freescale.com


Thanks, applied to kvm-ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html