On 04/09/2010 12:30 PM, Takuya Yoshikawa wrote:
This work is initially suggested by Avi Kivity for moving the
dirty bitmaps used by KVM to user space: This makes it possible
to manipulate the bitmaps from qemu without copying from KVM.
Note: We are now brushing up this code before sending
On 04/09/2010 12:32 PM, Takuya Yoshikawa wrote:
We will use this later in other parts.
s/rapper/wrapper/...
+static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot)
+{
+ return ALIGN(memslot-npages, BITS_PER_LONG) / 8;
+}
+
'int' may overflow.
struct
On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote:
For x86, we will change the allocation and free parts to do_mmap() and
do_munmap(). This patch makes it cleaner.
Should be done for all architectures. I don't want different ways of
creating dirty bitmaps for different architectures.
--
On 04/09/2010 12:35 PM, Takuya Yoshikawa wrote:
Currently, x86 vmalloc()s a dirty bitmap every time when we swich
to the next dirty bitmap. To avoid this, we use the double buffering
technique: we also move the bitmaps to userspace, so that extra
bitmaps will not use the precious kernel
On 04/09/2010 12:38 PM, Takuya Yoshikawa wrote:
By this patch, bitmap allocation is replaced with do_mmap() and
bitmap manipulation is replaced with *_user() functions.
Note that this does not change the APIs between kernel and user space.
To get more advantage from this hack, we need to add a
On 04/12/2010 05:04 AM, Zhang, Xiantao wrote:
What was the performance hit? What was your I/O setup (image format,
using aio?)
The issue only happens when vcpu number is over-committed(e.g. vcpu/pcpu2) and
physical cpus are saturated. For example, when run webbench in windows OS in
On 04/06/2010 03:51 AM, Yoshiaki Tamura wrote:
Replaces byte-based phys_ram_dirty bitmap with three bit-based phys_ram_dirty
bitmap. On allocation, it sets all bits in the bitmap.
index c74b0a4..9733892 100644
--- a/exec.c
+++ b/exec.c
@@ -110,7 +110,7 @@ uint8_t *code_gen_ptr;
#if
On 04/06/2010 03:51 AM, Yoshiaki Tamura wrote:
Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp
---
static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr)
{
-return phys_ram_dirty[addr TARGET_PAGE_BITS];
+
On 04/12/2010 11:01 AM, Xiao Guangrong wrote:
- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- restart list walking if have children page zapped
Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
On 04/12/2010 11:02 AM, Xiao Guangrong wrote:
- 'vcpu' is not used while mark parent unsync, so remove it
- if it has alread marked unsync, no need to walk it's parent
Please separate these two changes.
The optimization looks good. Perhaps it can be done even nicer using
mutually
On 04/12/2010 11:03 AM, Xiao Guangrong wrote:
Usually, OS changes CR4.PGE bit to flush all global page, under this
case, no need reset mmu and just flush tlb
Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
arch/x86/kvm/x86.c |9 +
1 files changed, 9 insertions(+), 0
On 04/12/2010 11:05 AM, Xiao Guangrong wrote:
'multimapped' and 'unsync' in 'struct kvm_mmu_page' are just indication
field, we can use flag bits instand of them
@@ -202,9 +202,10 @@ struct kvm_mmu_page {
* in this shadow page.
*/
DECLARE_BITMAP(slot_bitmap,
On 04/12/2010 11:06 AM, Xiao Guangrong wrote:
- chain all unsync shadow pages then we can fetch them quickly
- flush local/remote tlb after all shadow page synced
Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com
---
arch/x86/include/asm/kvm_host.h |1 +
arch/x86/kvm/mmu.c
On 04/12/2010 11:53 AM, Xiao Guangrong wrote:
kvm-arch.n_free_mmu_pages = 0;
@@ -1589,7 +1589,8 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t
gfn)
!sp-role.invalid) {
pgprintk(%s: zap %lx %x\n,
__func__, gfn, sp-role.word);
-
On 04/12/2010 04:29 AM, Takuya Yoshikawa wrote:
Should be called __set_bit_user() since it is non-atomic.
Actually I first named it like that and then noticed that in the uaccess'
convention, __ prefix means it is with less checking version.
On the other hand, for the bitops family, __
On 04/12/2010 05:07 AM, Takuya Yoshikawa wrote:
(2010/04/12 2:13), Avi Kivity wrote:
On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote:
For x86, we will change the allocation and free parts to do_mmap() and
do_munmap(). This patch makes it cleaner.
Should be done for all architectures. I don't
On 04/12/2010 05:15 AM, Takuya Yoshikawa wrote:
OK, but we have one problem: ia64. I checked all architectures' dirty
bitmap
implementations and thought generalizing this work is not so hard
except for
ia64. It's already too different from other parts.
#ifdef CONFIG_IA64
unsigned long
On 04/12/2010 05:29 AM, Takuya Yoshikawa wrote:
TODO:
1. We want to use copy_in_user() for 32bit case too.
Definitely. Why doesn't it work now?
Sadly we don't have that for 32bit. We have to implement by ourselves.
I tested two temporary implementations for 32bit:
1. This version
On 04/12/2010 12:39 PM, Yoshiaki Tamura wrote:
Please put in some header file, maybe qemu-common.h.
OK. BTW, is qemu-kvm.h planned to go upstream?
No. Use kvm.h for kvm specific symbols (qemu-kvm.h includes it).
Should be nicer as a loop calling a helper to allocate each bitmap. This
On 04/12/2010 12:07 AM, Andre Przywara wrote:
On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes (REX, segment override) are used.
Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
On 04/12/2010 12:22 PM, Xiao Guangrong wrote:
Hi Avi,
Avi Kivity wrote:
hlist_for_each_entry_safe() is supposed to be be safe against removal of
the element that is pointed to by the iteration cursor.
If we destroyed the next point, hlist_for_each_entry_safe() is unsafe.
List
On 04/12/2010 04:57 AM, wzt@gmail.com wrote:
coalesced_mmio_write() is not check the len value, if len is negative,
memcpy(ring-coalesced_mmio[ring-last].data, val, len); will cause
stack buffer overflow.
How can len be negative? It can only be between 1 and 8.
--
I have a truly
On 04/12/2010 01:29 PM, Alexander Graf wrote:
On 12.04.2010, at 12:20, Avi Kivity wrote:
On 04/12/2010 12:07 AM, Andre Przywara wrote:
On SVM we set the instruction length of skipped instructions
to hard-coded, well known values, which could be wrong when (bogus,
but valid) prefixes
On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:
Is it necessary to update migration and vga bitmaps?
We can simply update the master bitmap, and update the migration and vga
bitmaps only when they need it. That can be done in a different patch.
Let me explain the role of the master bitmap
On 04/12/2010 01:42 PM, Xiao Guangrong wrote:
Hi Avi,
Thanks for your comments.
Avi Kivity wrote:
Later we have:
kvm_x86_ops-set_cr4(vcpu, cr4);
vcpu-arch.cr4 = cr4;
vcpu-arch.mmu.base_role.cr4_pge = (cr4 X86_CR4_PGE)
!tdp_enabled;
All
On 04/12/2010 03:22 PM, Xiao Guangrong wrote:
But kvm_mmu_zap_page() will only destroy sp == tpos == pos; n points at
pos-next already, so it's safe.
kvm_mmu_zap_page(sp) not only zaps sp but also zaps all sp's unsync children
pages, if n is just sp's unsyc child, just at the same
On 04/12/2010 03:27 PM, Gleb Natapov wrote:
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.
The standard process is to make them identical first and finally merge
identical code, but I guess we can skip it in this case (Jan?)
--
I have a truly marvellous
On 04/12/2010 07:52 PM, Gleb Natapov wrote:
On Mon, Apr 12, 2010 at 06:09:50PM +0200, Jan Kiszka wrote:
Avi Kivity wrote:
On 04/12/2010 03:27 PM, Gleb Natapov wrote:
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.
The standard
On 04/08/2010 01:51 AM, Cam Macdonell wrote:
(sorry about the late review)
+
+Regular Interrupts
+--
+
+If regular interrupts are used (due to either a guest not supporting MSI or the
+user specifying not to use them on startup) then the value written to the lower
+16-bits of
On 04/08/2010 01:51 AM, Cam Macdonell wrote:
This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a
host file into guest RAM. This function mmaps the opened file anywhere and adds
the memory to the ram blocks.
Usage is
qemu_ram_mmap(fd, size, MAP_SHARED, offset);
---
On 04/08/2010 01:52 AM, Cam Macdonell wrote:
Support an inter-vm shared memory device that maps a shared-memory object as a
PCI device in the guest. This patch also supports interrupts between guest by
communicating over a unix domain socket. This patch applies to the qemu-kvm
repository.
On 04/08/2010 02:00 AM, Cam Macdonell wrote:
This patch adds a driver for my shared memory PCI device using the uio_pci
interface. The driver has three memory regions. The first memory region is for
device registers for sending interrupts. The second BAR is for receiving MSI-X
interrupts and
On 04/12/2010 11:55 PM, Fernando Luis Vazquez Cao wrote:
Sadly we don't have that for 32bit. We have to implement by ourselves.
I tested two temporary implementations for 32bit:
1. This version using copy_from_user() and copy_to_user() with
not nice vmalloc().
2. Loop with __get_user()
On 04/13/2010 06:07 AM, Xiao Guangrong wrote:
And i found the commit 87778d60ee:
|KVM: MMU: Segregate mmu pages created with different cr4.pge settings
|
|Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
|cr4.pge set; this might cause a cr3 switch not to
On 04/13/2010 03:50 AM, Zhang, Xiantao wrote:
Avi Kivity wrote:
On 04/12/2010 05:04 AM, Zhang, Xiantao wrote:
What was the performance hit? What was your I/O setup (image
format, using aio?)
The issue only happens when vcpu number is over-committed(e.g.
vcpu
On 04/13/2010 10:03 AM, Takuya Yoshikawa wrote:
It's better to limit memory slots to something that can be handled by
everything, then. 2^31 pages is plenty. Return -EINVAL if the slot is
too large.
I agree with that, so we make this patch pending to fix like that?
-- or should make a new
On 04/13/2010 10:21 AM, Gleb Natapov wrote:
May be I am missing something here, but it seams we can call
kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
instead of passing mmu_only down to emulator_write_emulated_onepage()
and call it there.
@@ -3460,7 +3444,9 @@ static int
On 04/13/2010 10:26 AM, Gleb Natapov wrote:
On Tue, Apr 13, 2010 at 10:24:40AM +0300, Avi Kivity wrote:
On 04/13/2010 10:21 AM, Gleb Natapov wrote:
May be I am missing something here, but it seams we can call
kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
instead
On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote:
Avi Kivity wrote:
On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:
Is it necessary to update migration and vga bitmaps?
We can simply update the master bitmap, and update the migration
and vga
bitmaps only when they need it. That can be done
On 04/14/2010 06:24 AM, Zhang, Xiantao wrote:
Spin loops need to be addressed first, they are known to kill
performance in overcommit situations.
Even in overcommit case, if vcpu threads of one qemu are not
scheduled or pulled to the same logical processor, the performance
drop is
On 04/14/2030 12:05 PM, Zhang, Yanmin wrote:
Here is the new patch of V3 against tip/master of April 13th
if anyone wants to try it.
Thanks for persisting despite the flames.
Can you please separate arch/x86/kvm part of the patch? That will make
for easier reviewing, and will need to
On 04/14/2010 12:43 PM, Sheng Yang wrote:
On Wednesday 14 April 2010 17:20:15 Avi Kivity wrote:
On 04/14/2030 12:05 PM, Zhang, Yanmin wrote:
Here is the new patch of V3 against tip/master of April 13th
if anyone wants to try it.
Thanks for persisting despite the flames.
Can
On 04/14/2010 01:14 PM, Sheng Yang wrote:
I wouldn't like to depend on model specific behaviour.
One option is to read all the information synchronously and store it in
a per-cpu area with atomic instructions, then queue the NMI. Another
option is to have another callback which tells us that
On 04/14/2010 01:43 PM, Ingo Molnar wrote:
Thanks for persisting despite the flames.
Can you please separate arch/x86/kvm part of the patch? That will make for
easier reviewing, and will need to go through separate trees.
Once it gets into a state that it can be applied could you
On 04/14/2010 03:11 PM, Jan Kiszka wrote:
When a fault triggers a task switch, the error code, if it exists, has
to be pushed on the new task's stack. Implement the missing bits.
@@ -2416,12 +2417,23 @@ static int emulator_do_task_switch(struct
x86_emulate_ctxt *ctxt,
On 04/14/2010 03:58 PM, Jan Kiszka wrote:
The TSS descriptor (gate doesn't have a size). But isn't it possible to
have a 32-bit TSS with a 16-bit CS/SS?
Might be possible, but will cause troubles as the spec says:
The error code is pushed on the stack as a doubleword or word
On 04/14/2010 04:07 PM, Avi Kivity wrote:
On 04/14/2010 03:58 PM, Jan Kiszka wrote:
The TSS descriptor (gate doesn't have a size). But isn't it
possible to
have a 32-bit TSS with a 16-bit CS/SS?
Might be possible, but will cause troubles as the spec says:
The error code is pushed
tables between pae and
longmode guest page tables at the same guest page.
Signed-off-by: Avi Kivity a...@redhat.com
---
arch/x86/include/asm/kvm_host.h |2 +-
arch/x86/kvm/mmu.c | 12 ++--
arch/x86/kvm/mmutrace.h |5 +++--
3 files changed, 10 insertions
On 04/14/2010 07:20 PM, Avi Kivity wrote:
There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly the same way. Drop
role.glevels and replace is with role.cr4_pae (which is meaningful). This
simplifies the code a bit
On 04/15/2030 04:04 AM, Zhang, Yanmin wrote:
An even more accurate way to determine this is to check whether the
interrupt frame points back at the 'int $2' instruction. However we
plan to switch to a self-IPI method to inject the NMI, and I'm not sure
wether APIC NMIs are accepted on an
On 04/15/2010 07:58 AM, Srivatsa Vaddagiri wrote:
On Sun, Apr 11, 2010 at 11:40 PM, Avi Kivity a...@redhat.com
mailto:a...@redhat.com wrote:
The current handing of PLE is very suboptimal. With proper
directed yield we should be much better there.
Hi Avi,
By directed
On 04/15/2010 02:30 AM, Cam Macdonell wrote:
Sample programs, init scripts and the shared memory server are available
in a
git repo here:
www.gitorious.org/nahanni
Please consider qemu.git/contrib.
Should the compilation be tied into Qemu's regular build with a switch
On 04/14/2010 04:19 PM, Jan Kiszka wrote:
Avi Kivity wrote:
On 04/14/2010 03:58 PM, Jan Kiszka wrote:
The TSS descriptor (gate doesn't have a size). But isn't it possible to
have a 32-bit TSS with a 16-bit CS/SS?
Might be possible, but will cause troubles as the spec
On 04/14/2010 09:29 PM, Marcelo Tosatti wrote:
On Wed, Apr 14, 2010 at 07:32:12PM +0300, Avi Kivity wrote:
On 04/14/2010 07:20 PM, Avi Kivity wrote:
There is no real distinction between glevels=3 and glevels=4; both have
exactly the same format and the code is treated exactly
On 04/15/2010 12:04 PM, oerg Roedel wrote:
On Mon, Apr 15, 2030 at 04:57:38PM +0800, Zhang, Yanmin wrote:
I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
happens in guest os. In addition, svm_complete_interrupts is called after
interrupt is enabled.
Yes.
On 04/15/2010 12:28 PM, Gleb Natapov wrote:
kvm_task_switch() never requires userspace exit, so no matter what the
function returns we should not exit to userspace.
Signed-off-by: Gleb Natapovg...@redhat.com
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c773a46..1bd434b 100644
---
On 04/15/2010 12:44 PM, oerg Roedel wrote:
So, we'd need something like the following:
if (exit == NMI)
__get_cpu_var(nmi_vcpu) = vcpu;
stgi();
if (exit == NMI) {
while (!nmi_handled())
cpu_relax();
__get_cpu_var(nmi_vcpu) = NULL;
}
On 04/15/2010 01:09 PM, Gleb Natapov wrote:
If kvm_task_switch() fails code exits to userspace without specifying
exit reason, so the previous exit reason is reused by userspace. Fix
this by specifying exit reason correctly.
---
Changelog:
v1-v2:
- report emulation error to userspace
On 04/15/2010 01:40 PM, Joerg Roedel wrote:
That means an NMI that happens outside guest code (for example, in the
mmu, or during the exit itself) would be counted as if in guest code.
Hmm, true. The same is true for an NMI that happens between VMSAVE and
STGI but that window is
On 04/15/2010 05:08 PM, Sheng Yang wrote:
On Thursday 15 April 2010 18:44:15 Avi Kivity wrote:
On 04/15/2010 01:40 PM, Joerg Roedel wrote:
That means an NMI that happens outside guest code (for example, in the
mmu, or during the exit itself) would be counted as if in guest code
On 04/16/2010 10:34 AM, Zhang, Yanmin wrote:
Below is the kernel patch to enable perf to collect guest os statistics.
Joerg,
Would you like to add support on svm? I don't know the exact point to trigger
NMI to host with svm.
See below code with vmx:
+
On 04/15/2010 11:55 PM, Tom Lyon wrote:
This is the second of 2 related, but independent, patches. This is for
uio.c, the previous is for uio_pci_generic.c.
The 2 patches were previously one large patch. Changes for this version:
- uio_pci_generic.c just gets extensions so that a single fd can
On 04/17/2010 09:48 PM, Avi Kivity wrote:
+static u64 last_value = 0;
Needs to be atomic64_t.
+
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
{
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+u64 last;
+do
On 04/15/2010 09:37 PM, Glauber Costa wrote:
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.
Signed-off-by: Glauber Costaglom...@redhat.com
---
arch/x86/include/asm/kvm_para.h |8
On 04/15/2010 09:37 PM, Glauber Costa wrote:
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.
However, kvm clock registration happens
On 04/15/2010 09:37 PM, Glauber Costa wrote:
Since we're changing the msrs kvmclock uses, we have to communicate
that to the guest, through cpuid. We can add a new KVM_CAP to the
hypervisor, and then patch userspace to recognize it.
And if we ever add a new cpuid bit in the future, we have to
On 04/16/2010 05:27 AM, Zhang, Xiantao wrote:
When vcpus are pinned to pcpus, there is a 50% chance that a guest's
vcpus will be co-scheduled and spinlocks will perform will.
When vcpus are not pinned, but affine wakeups are disabled, there is a
33% chance that vcpus will be co-scheduled.
On 04/15/2010 04:33 PM, Peter Zijlstra wrote:
On Thu, 2010-04-15 at 11:18 +0300, Avi Kivity wrote:
Certainly that has even greater potential for Linux guests. Note that
we spin on mutexes now, so we need to prevent preemption while the lock
owner is running.
either that, or disable
On 04/13/2010 07:07 AM, Øyvind Sæther wrote:
The patch lets me run 1920x1080 resolution, some displays are only that and
not 1920x1200 these days. Gentoos ebuild doesn't seem to make the vgabios and
only uses /pc-bios/vgabios.bin, making kvm/vgabios and replace vgabios.bin
with the
On 04/17/2010 09:12 PM, Avi Kivity wrote:
I think you were right the first time around.
Re-reading again (esp. the part about treatment of indirect NMI
vmexits), I think this was wrong, and that the code is correct. I am
now thoroughly confused.
--
error compiling committee.c: too many
On 04/19/2010 08:32 AM, Zhang, Yanmin wrote:
Below patch introduces perf_guest_info_callbacks and related register/unregister
functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest
user
space.
This doesn't apply against upstream. What branch was this generated
On 04/19/2010 04:26 AM, Alexander Graf wrote:
Very true. In fact, I certainly remember me putting a return and a
WARN_ON(true) because WARN() gave me a warning here. I wonder where that code
went ... hrm ...
Either way, thanks for looking over this patch!
Ugh - I messed up my patch
On 04/19/2010 11:55 AM, Zhang, Yanmin wrote:
On Mon, 2010-04-19 at 11:37 +0300, Avi Kivity wrote:
On 04/19/2010 08:32 AM, Zhang, Yanmin wrote:
Below patch introduces perf_guest_info_callbacks and related register/unregister
functions. Add more PERF_RECORD_MISC_XXX bits meaning guest
On 04/19/2010 11:59 AM, Avi Kivity wrote:
What branch was this generated
against?
It's against the latest tip/master. I checked out to 19b26586090 as
the latest
tip/master has some updates on perf.
I don't want to merge tip/master... does tip/perf/core contain the
needed updates
On 04/18/2010 09:33 AM, Manish Regmi wrote:
Hi,
The following patch makes sure all code path of failed emulation
runs trace_kvm_emulate_insn_failed().
Please let me know if there is anything missing or wrong.
Thank you.
Signed-off-by: Manish Regmiregmi.man...@gmail.com
diff --git
On 04/18/2010 09:35 AM, Manish Regmi wrote:
Hi,
When the vm exit reason is VM Entry failures it has leftmost bit set.
This patch
- clears the leftmost bit when copying to vmx-exit_reason. This will
make the checks like if ((vmx-exit_reason ==
EXIT_REASON_MCE_DURING_VMENTRY) valid in
On 04/19/2010 12:41 PM, Lai Jiangshan wrote:
The RCU/SRCU API have already changed for proving RCU usage.
I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.
On 04/19/2010 12:22 PM, Zhang, Yanmin wrote:
I don't want to merge tip/master... does tip/perf/core contain the
needed updates?
I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3
patches could be applied cleanly and compilation is ok. A quick testing shows
On 04/19/2010 12:58 PM, Lai Jiangshan wrote:
Applied the patch I just sent and let CONFIG_PROVE_RCU=y,
we can got the following dmesg. And we found that it is
because some codes in KVM dereferences srcu-protected pointer without
srcu_read_lock() held or update-side lock held.
It is not hard to
On 04/17/2010 01:22 AM, Alexander Graf wrote:
When we get a performance counter interrupt we need to route it on to the
Linux handler after we got out of the guest context. We also need to tell
our handling code that this particular interrupt doesn't need treatment.
So let's add those two bits
On 04/19/2010 01:43 PM, Peter Zijlstra wrote:
+
cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
{
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+u64 last;
+do {
+last = last_value;
On 04/19/2010 01:46 PM, Peter Zijlstra wrote:
On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote:
After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.
Please define a cpuid bit that makes
On 04/19/2010 01:39 PM, Peter Zijlstra wrote:
On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote:
+ do {
+ last = last_value;
Does this need a barrier() to prevent the compiler from re-reading
last_value for the subsequent lines? Otherwise (ret last)
On 04/19/2010 01:49 PM, Peter Zijlstra wrote:
Right, so on x86 we have:
X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.
X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and
On 04/19/2010 01:51 PM, Peter Zijlstra wrote:
Right, so on x86 we have:
X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency
independent, not that it doesn't stop in C states and similar fun stuff.
X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant
and
On 04/19/2010 02:05 PM, Peter Zijlstra wrote:
ACCESS_ONCE() is your friend.
I think it's implied with atomic64_read().
Yes it would be. I was merely trying to point out that
last = ACCESS_ONCE(last_value);
Is a narrower way of writing:
last = last_value;
barrier();
On 04/19/2010 01:56 PM, Peter Zijlstra wrote:
Right, do bear in mind that the x86 implementation of atomic64_read() is
terrifyingly expensive, it is better to not do that read and simply use
the result of the cmpxchg.
atomic64_read() _is_ cmpxchg64b. Are you thinking of some
On 04/19/2010 01:59 PM, Peter Zijlstra wrote:
So what do we need? test for both TSC_RELIABLE and NONSTOP_TSC? IMO
TSC_RELIABLE should imply NONSTOP_TSC.
Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP
CONSTANT does not make RELIABLE.
The manual says:
On 04/19/2010 02:19 PM, Peter Zijlstra wrote:
Still have two cmpxchgs in the common case. The first iteration will
fail, fetching last_value, the second will work.
It will be better when we have contention, though, so it's worthwhile.
Right, another option is to put the initial read
Since commit bf47a760f66ad, we no longer handle ptes with the global bit
set specially, so there is no reason to distinguish between shadow pages
created with cr4.gpe set and clear.
Such tracking is expensive when the guest toggles cr4.pge, so drop it.
Signed-off-by: Avi Kivity a...@redhat.com
On 04/19/2010 05:21 PM, Glauber Costa wrote:
Oh yes, just trying to avoid a patch with both atomic64_read() and
ACCESS_ONCE().
you're mixing the private version of the patch you saw with this one.
there isn't any atomic reads in here. I'll use a barrier then
This patch writes
On 04/19/2010 05:32 PM, Glauber Costa wrote:
Right, another option is to put the initial read outside of the loop,
that way you'll have the best of all cases, a single LOCK'ed op in the
loop, and only a single LOCK'ed op for the fast path on sensible
architectures ;-)
last =
On 04/19/2010 05:50 PM, Glauber Costa wrote:
On Sat, Apr 17, 2010 at 09:58:26PM +0300, Avi Kivity wrote:
On 04/15/2010 09:37 PM, Glauber Costa wrote:
Since we're changing the msrs kvmclock uses, we have to communicate
that to the guest, through cpuid. We can add a new KVM_CAP
On 04/19/2010 07:18 PM, Jeremy Fitzhardinge wrote:
On 04/19/2010 07:46 AM, Peter Zijlstra wrote:
What avi says! :-)
On a 32bit machine a 64bit read are two 32bit reads, so
last = last_value;
becomes:
last.high = last_value.high;
last.low = last_vlue.low;
(or the reverse of
On 04/20/2010 04:57 AM, Marcelo Tosatti wrote:
Marcelo can probably confirm it, but he has a nehalem with an appearently
very good tsc source. Even this machine warps.
It stops warping if we only write pvclock data structure once and forget it,
(which only updated tsc_timestamp once),
On 04/20/2010 06:32 AM, Sheng Yang wrote:
On Monday 19 April 2010 16:25:17 Avi Kivity wrote:
On 04/17/2010 09:12 PM, Avi Kivity wrote:
I think you were right the first time around.
Re-reading again (esp. the part about treatment of indirect NMI
vmexits), I think
On 04/19/2010 09:35 PM, Zachary Amsden wrote:
Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)?
Not sure, IIRC we clear that when the TSC sync test fails, eg when we
mark the tsc clocksource unusable.
Worrying. By the time we detect this the guest may already have
gotten
On 04/20/2010 03:59 PM, Glauber Costa wrote:
Might be due to NMIs or SMIs interrupting the rdtsc(); ktime_get()
operation which establishes the timeline. We could limit it by
having a loop doing rdtsc(); ktime_get(); rdtsc(); and checking for
some bound, but it isn't worthwhile (and will
On 04/20/2010 09:23 PM, Jeremy Fitzhardinge wrote:
On 04/20/2010 02:31 AM, Avi Kivity wrote:
btw, do you want this code in pvclock.c, or shall we keep it kvmclock
specific?
I think its a pvclock-level fix. I'd been hoping to avoid having
something like this, but I think its
On 04/21/2010 03:01 AM, Zachary Amsden wrote:
on this machine Glauber mentioned, or even on a multi-core Core 2 Duo),
but the delta calculation is very hard (if not impossible) to get
right.
The timewarps i've seen were in the 0-200ns range, and very rare (once
every 10 minutes or so).
201 - 300 of 16034 matches
Mail list logo