Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> To me such atomicity is provided by the "sti" instruction (i.e. the
> processor begins responding to external, maskable interrupts _after_ the
> next instruction is executed), and there is nothing special with that
> combination "sti; hlt" (you can also have like "sti; ret", for example).
>   

Sure, but there's no particular value in "sti; ret".  While the sti mask
window works everywhere, its only cases like "sti; hlt" where it's
needed to avoid a race condition.

> So if you define a PV ops like STI(next_instruction), "safe_halt" for
> the native should be defined as STI("hlt"), and inlined as "sti; hlt". 
>   

That's only meaningful if the pv_op is implemented directly in x86
instructions - ie, the native (or almost native) case.

> If it's hard or we don't need to expose the semantics of "sti" other
> than that, I think it's okay to have a PV operation for safe_halt.
>   

Yeah, the general form would be hard to support for a hypervisor.  Xen,
for example, has an "atomically enable events and block" operation, but
no other "atomically enable events and do X" operations.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
> > just "hlt". So the para_virt part of "hlt" could be moved to
pv_cpu_ops,
> > and the "sti" part stays in pv_irq_ops.
> > 
> 
> By "sti part", you mean the full "sti; hlt" sequence of safe_halt,
> right?  Since it needs to be an atomic sequence to avoid race
> conditions, so the native sequence has to be precisely "sti; hlt" to
> take advantage of the sti shadow, and other pv-backends will need
their
> own way to guarantee this atomicity.

To me such atomicity is provided by the "sti" instruction (i.e. the
processor begins responding to external, maskable interrupts _after_ the
next instruction is executed), and there is nothing special with that
combination "sti; hlt" (you can also have like "sti; ret", for example).
So if you define a PV ops like STI(next_instruction), "safe_halt" for
the native should be defined as STI("hlt"), and inlined as "sti; hlt". 

If it's hard or we don't need to expose the semantics of "sti" other
than that, I think it's okay to have a PV operation for safe_halt.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
 Nakajima, Jun wrote:
  Yes. For the native, safe_halt is sti; hlt. The native_halt is
  just hlt. So the para_virt part of hlt could be moved to
pv_cpu_ops,
  and the sti part stays in pv_irq_ops.
  
 
 By sti part, you mean the full sti; hlt sequence of safe_halt,
 right?  Since it needs to be an atomic sequence to avoid race
 conditions, so the native sequence has to be precisely sti; hlt to
 take advantage of the sti shadow, and other pv-backends will need
their
 own way to guarantee this atomicity.

To me such atomicity is provided by the sti instruction (i.e. the
processor begins responding to external, maskable interrupts _after_ the
next instruction is executed), and there is nothing special with that
combination sti; hlt (you can also have like sti; ret, for example).
So if you define a PV ops like STI(next_instruction), safe_halt for
the native should be defined as STI(hlt), and inlined as sti; hlt. 

If it's hard or we don't need to expose the semantics of sti other
than that, I think it's okay to have a PV operation for safe_halt.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
 To me such atomicity is provided by the sti instruction (i.e. the
 processor begins responding to external, maskable interrupts _after_ the
 next instruction is executed), and there is nothing special with that
 combination sti; hlt (you can also have like sti; ret, for example).
   

Sure, but there's no particular value in sti; ret.  While the sti mask
window works everywhere, its only cases like sti; hlt where it's
needed to avoid a race condition.

 So if you define a PV ops like STI(next_instruction), safe_halt for
 the native should be defined as STI(hlt), and inlined as sti; hlt. 
   

That's only meaningful if the pv_op is implemented directly in x86
instructions - ie, the native (or almost native) case.

 If it's hard or we don't need to expose the semantics of sti other
 than that, I think it's okay to have a PV operation for safe_halt.
   

Yeah, the general form would be hard to support for a hypervisor.  Xen,
for example, has an atomically enable events and block operation, but
no other atomically enable events and do X operations.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
> just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops,
> and the "sti" part stays in pv_irq_ops.
>   

By "sti part", you mean the full "sti; hlt" sequence of safe_halt,
right?  Since it needs to be an atomic sequence to avoid race
conditions, so the native sequence has to be precisely "sti; hlt" to
take advantage of the sti shadow, and other pv-backends will need their
own way to guarantee this atomicity.

But I'm quite happy to put plain "hlt" into cpu_ops as halt_cpu() or
something (and perhaps rename safe_halt to something a bit more
descriptive).

> Actually my concern was that such misc ops might grow to include the
> things don't fit well anywhere else. To me, then pv_lazy_ops (with just
> .set_mode) might be better.
>   

The lazy interface has needed a rethink anyway.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > Jeremy Fitzhardinge wrote:
> > 
> > 
> > > + .pv_irq_ops = {
> > > +  .init_IRQ = native_init_IRQ,
> > > +  .save_fl = native_save_fl,
> > > +  .restore_fl = native_restore_fl,
> > > +  .irq_disable = native_irq_disable,
> > > +  .irq_enable = native_irq_enable,
> > > +  .safe_halt = native_safe_halt,
> > > +  .halt = native_halt,
> > > +  },
> > > 
> > 
> > I think the halt stuff should be moved to pv_cpu_ops?
> > 
> You mean halt's alternate "shutdown vcpu" meaning if you call it with
> interrupts disabled?  Yeah, I'd be happy to have an explicit op for
> that, rather than making it a secondary overloaded meaning.  And use
> "safe_halt" for all uses of "wait for next interrupt".

Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops,
and the "sti" part stays in pv_irq_ops.

> 
> > > + .pv_misc_ops = {
> > > +  .set_lazy_mode = paravirt_nop,
> > > +  },
> > > 
> > 
> > Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
> > don't need to interact with each other in terms of the lazy
handling.
> > 
> 
> You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch)
ops?
> Possible, but they're still exclusive.  (I think VMI, at least,
assumes
> that you can't have lazy_mmu and lazy_cpu active at the same time, and
> its nice to enforce this in the interface.)

Okay I understand what you are saying.

> 
> But having a whole misc structure for this interface is pretty warty,
I
> admit.
> 
> J

Actually my concern was that such misc ops might grow to include the
things don't fit well anywhere else. To me, then pv_lazy_ops (with just
.set_mode) might be better.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> Jeremy Fitzhardinge wrote:
>   
>> This patch refactors the paravirt_ops structure into groups of
>> functionally related ops:
>>
>> pv_info - random info, rather than function entrypoints
>> pv_init_ops - functions used at boot time (some for module_init too)
>> pv_misc_ops - lazy mode, which didn't fit well anywhere else
>> pv_time_ops - time-related functions
>> pv_cpu_ops - various privileged instruction ops
>> pv_irq_ops - operations for managing interrupt state
>> pv_apic_ops - APIC operations
>> pv_mmu_ops - operations for managing pagetables
>>
>> 
>
> Good. These make sense to me.
>
>   
>> +.pv_irq_ops = {
>> + .init_IRQ = native_init_IRQ,
>> + .save_fl = native_save_fl,
>> + .restore_fl = native_restore_fl,
>> + .irq_disable = native_irq_disable,
>> + .irq_enable = native_irq_enable,
>> + .safe_halt = native_safe_halt,
>> + .halt = native_halt,
>> + },
>> 
>
> I think the halt stuff should be moved to pv_cpu_ops?
>   
You mean halt's alternate "shutdown vcpu" meaning if you call it with
interrupts disabled?  Yeah, I'd be happy to have an explicit op for
that, rather than making it a secondary overloaded meaning.  And use
"safe_halt" for all uses of "wait for next interrupt".

>> +.pv_misc_ops = {
>> + .set_lazy_mode = paravirt_nop,
>> + },
>> 
>
> Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
> don't need to interact with each other in terms of the lazy handling.
>   

You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch) ops? 
Possible, but they're still exclusive.  (I think VMI, at least, assumes
that you can't have lazy_mmu and lazy_cpu active at the same time, and
its nice to enforce this in the interface.)

But having a whole misc structure for this interface is pretty warty, I
admit.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> This patch refactors the paravirt_ops structure into groups of
> functionally related ops:
> 
> pv_info - random info, rather than function entrypoints
> pv_init_ops - functions used at boot time (some for module_init too)
> pv_misc_ops - lazy mode, which didn't fit well anywhere else
> pv_time_ops - time-related functions
> pv_cpu_ops - various privileged instruction ops
> pv_irq_ops - operations for managing interrupt state
> pv_apic_ops - APIC operations
> pv_mmu_ops - operations for managing pagetables
> 

Good. These make sense to me.

> + .pv_irq_ops = {
> +  .init_IRQ = native_init_IRQ,
> +  .save_fl = native_save_fl,
> +  .restore_fl = native_restore_fl,
> +  .irq_disable = native_irq_disable,
> +  .irq_enable = native_irq_enable,
> +  .safe_halt = native_safe_halt,
> +  .halt = native_halt,
> +  },

I think the halt stuff should be moved to pv_cpu_ops?

> + .pv_misc_ops = {
> +  .set_lazy_mode = paravirt_nop,
> +  },

Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
don't need to interact with each other in terms of the lazy handling.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:49 -0700, Jeremy Fitzhardinge wrote:
> > We shouldn't need to export pv_init_ops.  
> 
> No.  The only ones I export are:
> 
> EXPORT_SYMBOL_GPL(pv_time_ops);
> EXPORT_SYMBOL_GPL(pv_cpu_ops);
> EXPORT_SYMBOL_GPL(pv_mmu_ops);
> EXPORT_SYMBOL_GPL(pv_apic_ops);
> EXPORT_SYMBOL(pv_irq_ops);

Nicely done.  I know of some out of tree modules which use part of the
pv_cpu_ops and pv_mmu_ops, but we should not worry about such things,
and it turns out those modules don't need to be virtualized anyway.

> 
> 
> > It is debatable whether
> > CR2/CR3 should be part of CPU or MMU ops.
> 
> Yeah, I was in two minds.  CR3, at least, should be grouped with the
> other tlb operations, wherever they go.  And while they're privileged
> CPU instructions (cpu_ops), they're more logically related to the rest
> of the mmu state.  On the other hand, we could have an ops structure
> specifically dedicated to pagetable manipulations, and put the cr3/tlb
> ops elsewhere.

I'm not against either approach.  I think the way you did it is fine.
If it were up to me, I would probably have driven myself crazy splitting
hairs on it until I was bald.

> >   Also, can we drop write_cr2?
> > It isn't used anywhere, so the only reason to keep it is symmetry.
> > Which was a fine argument when it was an inline, but now it just adds
> > unused junk to the code.
> >   
> 
> I think its used in some cpu state save/restore code, but its not
> relevant to pv-ops.

Ah yes, it is used there.  We actually exercise some of those paths, but
they don't need to be strictly virtualized.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
>   
>> This patch refactors the paravirt_ops structure into groups of
>> functionally related ops:
>>
>> pv_info - random info, rather than function entrypoints
>> pv_init_ops - functions used at boot time (some for module_init too)
>> pv_misc_ops - lazy mode, which didn't fit well anywhere else
>> pv_time_ops - time-related functions
>> pv_cpu_ops - various privileged instruction ops
>> pv_irq_ops - operations for managing interrupt state
>> pv_apic_ops - APIC operations
>> pv_mmu_ops - operations for managing pagetables
>>
>> There are several motivations for this:
>>
>> 1. Some of these ops will be general to all x86, and some will be
>>i386/x86-64 specific.  This makes it easier to share common stuff
>>while allowing separate implementations where needed.
>>
>> 2. At the moment we must export all of paravirt_ops, but modules only
>>need selected parts of it.  This allows us to export on a case by case
>>basis (and also choose which export license we want to apply).
>> 
>
> We shouldn't need to export pv_init_ops.  

No.  The only ones I export are:

EXPORT_SYMBOL_GPL(pv_time_ops);
EXPORT_SYMBOL_GPL(pv_cpu_ops);
EXPORT_SYMBOL_GPL(pv_mmu_ops);
EXPORT_SYMBOL_GPL(pv_apic_ops);
EXPORT_SYMBOL(pv_irq_ops);


> It is debatable whether
> CR2/CR3 should be part of CPU or MMU ops.

Yeah, I was in two minds.  CR3, at least, should be grouped with the
other tlb operations, wherever they go.  And while they're privileged
CPU instructions (cpu_ops), they're more logically related to the rest
of the mmu state.  On the other hand, we could have an ops structure
specifically dedicated to pagetable manipulations, and put the cr3/tlb
ops elsewhere.

>   Also, can we drop write_cr2?
> It isn't used anywhere, so the only reason to keep it is symmetry.
> Which was a fine argument when it was an inline, but now it just adds
> unused junk to the code.
>   

I think its used in some cpu state save/restore code, but its not
relevant to pv-ops.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
> This patch refactors the paravirt_ops structure into groups of
> functionally related ops:
> 
> pv_info - random info, rather than function entrypoints
> pv_init_ops - functions used at boot time (some for module_init too)
> pv_misc_ops - lazy mode, which didn't fit well anywhere else
> pv_time_ops - time-related functions
> pv_cpu_ops - various privileged instruction ops
> pv_irq_ops - operations for managing interrupt state
> pv_apic_ops - APIC operations
> pv_mmu_ops - operations for managing pagetables
> 
> There are several motivations for this:
> 
> 1. Some of these ops will be general to all x86, and some will be
>i386/x86-64 specific.  This makes it easier to share common stuff
>while allowing separate implementations where needed.
> 
> 2. At the moment we must export all of paravirt_ops, but modules only
>need selected parts of it.  This allows us to export on a case by case
>basis (and also choose which export license we want to apply).

We shouldn't need to export pv_init_ops.  It is debatable whether
CR2/CR3 should be part of CPU or MMU ops.  Also, can we drop write_cr2?
It isn't used anywhere, so the only reason to keep it is symmetry.
Which was a fine argument when it was an inline, but now it just adds
unused junk to the code.

> This still needs to be reconciled with the x86 merge and glommer's
> paravirt_ops unification work.

Those issues aside, this looks great.  I'll give it a whirl today.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
 This patch refactors the paravirt_ops structure into groups of
 functionally related ops:
 
 pv_info - random info, rather than function entrypoints
 pv_init_ops - functions used at boot time (some for module_init too)
 pv_misc_ops - lazy mode, which didn't fit well anywhere else
 pv_time_ops - time-related functions
 pv_cpu_ops - various privileged instruction ops
 pv_irq_ops - operations for managing interrupt state
 pv_apic_ops - APIC operations
 pv_mmu_ops - operations for managing pagetables
 
 There are several motivations for this:
 
 1. Some of these ops will be general to all x86, and some will be
i386/x86-64 specific.  This makes it easier to share common stuff
while allowing separate implementations where needed.
 
 2. At the moment we must export all of paravirt_ops, but modules only
need selected parts of it.  This allows us to export on a case by case
basis (and also choose which export license we want to apply).

We shouldn't need to export pv_init_ops.  It is debatable whether
CR2/CR3 should be part of CPU or MMU ops.  Also, can we drop write_cr2?
It isn't used anywhere, so the only reason to keep it is symmetry.
Which was a fine argument when it was an inline, but now it just adds
unused junk to the code.

 This still needs to be reconciled with the x86 merge and glommer's
 paravirt_ops unification work.

Those issues aside, this looks great.  I'll give it a whirl today.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
 On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
   
 This patch refactors the paravirt_ops structure into groups of
 functionally related ops:

 pv_info - random info, rather than function entrypoints
 pv_init_ops - functions used at boot time (some for module_init too)
 pv_misc_ops - lazy mode, which didn't fit well anywhere else
 pv_time_ops - time-related functions
 pv_cpu_ops - various privileged instruction ops
 pv_irq_ops - operations for managing interrupt state
 pv_apic_ops - APIC operations
 pv_mmu_ops - operations for managing pagetables

 There are several motivations for this:

 1. Some of these ops will be general to all x86, and some will be
i386/x86-64 specific.  This makes it easier to share common stuff
while allowing separate implementations where needed.

 2. At the moment we must export all of paravirt_ops, but modules only
need selected parts of it.  This allows us to export on a case by case
basis (and also choose which export license we want to apply).
 

 We shouldn't need to export pv_init_ops.  

No.  The only ones I export are:

EXPORT_SYMBOL_GPL(pv_time_ops);
EXPORT_SYMBOL_GPL(pv_cpu_ops);
EXPORT_SYMBOL_GPL(pv_mmu_ops);
EXPORT_SYMBOL_GPL(pv_apic_ops);
EXPORT_SYMBOL(pv_irq_ops);


 It is debatable whether
 CR2/CR3 should be part of CPU or MMU ops.

Yeah, I was in two minds.  CR3, at least, should be grouped with the
other tlb operations, wherever they go.  And while they're privileged
CPU instructions (cpu_ops), they're more logically related to the rest
of the mmu state.  On the other hand, we could have an ops structure
specifically dedicated to pagetable manipulations, and put the cr3/tlb
ops elsewhere.

   Also, can we drop write_cr2?
 It isn't used anywhere, so the only reason to keep it is symmetry.
 Which was a fine argument when it was an inline, but now it just adds
 unused junk to the code.
   

I think its used in some cpu state save/restore code, but its not
relevant to pv-ops.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:49 -0700, Jeremy Fitzhardinge wrote:
  We shouldn't need to export pv_init_ops.  
 
 No.  The only ones I export are:
 
 EXPORT_SYMBOL_GPL(pv_time_ops);
 EXPORT_SYMBOL_GPL(pv_cpu_ops);
 EXPORT_SYMBOL_GPL(pv_mmu_ops);
 EXPORT_SYMBOL_GPL(pv_apic_ops);
 EXPORT_SYMBOL(pv_irq_ops);

Nicely done.  I know of some out of tree modules which use part of the
pv_cpu_ops and pv_mmu_ops, but we should not worry about such things,
and it turns out those modules don't need to be virtualized anyway.

 
 
  It is debatable whether
  CR2/CR3 should be part of CPU or MMU ops.
 
 Yeah, I was in two minds.  CR3, at least, should be grouped with the
 other tlb operations, wherever they go.  And while they're privileged
 CPU instructions (cpu_ops), they're more logically related to the rest
 of the mmu state.  On the other hand, we could have an ops structure
 specifically dedicated to pagetable manipulations, and put the cr3/tlb
 ops elsewhere.

I'm not against either approach.  I think the way you did it is fine.
If it were up to me, I would probably have driven myself crazy splitting
hairs on it until I was bald.

Also, can we drop write_cr2?
  It isn't used anywhere, so the only reason to keep it is symmetry.
  Which was a fine argument when it was an inline, but now it just adds
  unused junk to the code.

 
 I think its used in some cpu state save/restore code, but its not
 relevant to pv-ops.

Ah yes, it is used there.  We actually exercise some of those paths, but
they don't need to be strictly virtualized.

Zach

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
 This patch refactors the paravirt_ops structure into groups of
 functionally related ops:
 
 pv_info - random info, rather than function entrypoints
 pv_init_ops - functions used at boot time (some for module_init too)
 pv_misc_ops - lazy mode, which didn't fit well anywhere else
 pv_time_ops - time-related functions
 pv_cpu_ops - various privileged instruction ops
 pv_irq_ops - operations for managing interrupt state
 pv_apic_ops - APIC operations
 pv_mmu_ops - operations for managing pagetables
 

Good. These make sense to me.

 + .pv_irq_ops = {
 +  .init_IRQ = native_init_IRQ,
 +  .save_fl = native_save_fl,
 +  .restore_fl = native_restore_fl,
 +  .irq_disable = native_irq_disable,
 +  .irq_enable = native_irq_enable,
 +  .safe_halt = native_safe_halt,
 +  .halt = native_halt,
 +  },

I think the halt stuff should be moved to pv_cpu_ops?

 + .pv_misc_ops = {
 +  .set_lazy_mode = paravirt_nop,
 +  },

Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
don't need to interact with each other in terms of the lazy handling.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
 Jeremy Fitzhardinge wrote:
   
 This patch refactors the paravirt_ops structure into groups of
 functionally related ops:

 pv_info - random info, rather than function entrypoints
 pv_init_ops - functions used at boot time (some for module_init too)
 pv_misc_ops - lazy mode, which didn't fit well anywhere else
 pv_time_ops - time-related functions
 pv_cpu_ops - various privileged instruction ops
 pv_irq_ops - operations for managing interrupt state
 pv_apic_ops - APIC operations
 pv_mmu_ops - operations for managing pagetables

 

 Good. These make sense to me.

   
 +.pv_irq_ops = {
 + .init_IRQ = native_init_IRQ,
 + .save_fl = native_save_fl,
 + .restore_fl = native_restore_fl,
 + .irq_disable = native_irq_disable,
 + .irq_enable = native_irq_enable,
 + .safe_halt = native_safe_halt,
 + .halt = native_halt,
 + },
 

 I think the halt stuff should be moved to pv_cpu_ops?
   
You mean halt's alternate shutdown vcpu meaning if you call it with
interrupts disabled?  Yeah, I'd be happy to have an explicit op for
that, rather than making it a secondary overloaded meaning.  And use
safe_halt for all uses of wait for next interrupt.

 +.pv_misc_ops = {
 + .set_lazy_mode = paravirt_nop,
 + },
 

 Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
 don't need to interact with each other in terms of the lazy handling.
   

You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch) ops? 
Possible, but they're still exclusive.  (I think VMI, at least, assumes
that you can't have lazy_mmu and lazy_cpu active at the same time, and
its nice to enforce this in the interface.)

But having a whole misc structure for this interface is pretty warty, I
admit.

J

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
 Nakajima, Jun wrote:
  Jeremy Fitzhardinge wrote:
  
  
   + .pv_irq_ops = {
   +  .init_IRQ = native_init_IRQ,
   +  .save_fl = native_save_fl,
   +  .restore_fl = native_restore_fl,
   +  .irq_disable = native_irq_disable,
   +  .irq_enable = native_irq_enable,
   +  .safe_halt = native_safe_halt,
   +  .halt = native_halt,
   +  },
   
  
  I think the halt stuff should be moved to pv_cpu_ops?
  
 You mean halt's alternate shutdown vcpu meaning if you call it with
 interrupts disabled?  Yeah, I'd be happy to have an explicit op for
 that, rather than making it a secondary overloaded meaning.  And use
 safe_halt for all uses of wait for next interrupt.

Yes. For the native, safe_halt is sti; hlt. The native_halt is
just hlt. So the para_virt part of hlt could be moved to pv_cpu_ops,
and the sti part stays in pv_irq_ops.

 
   + .pv_misc_ops = {
   +  .set_lazy_mode = paravirt_nop,
   +  },
   
  
  Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
  don't need to interact with each other in terms of the lazy
handling.
  
 
 You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch)
ops?
 Possible, but they're still exclusive.  (I think VMI, at least,
assumes
 that you can't have lazy_mmu and lazy_cpu active at the same time, and
 its nice to enforce this in the interface.)

Okay I understand what you are saying.

 
 But having a whole misc structure for this interface is pretty warty,
I
 admit.
 
 J

Actually my concern was that such misc ops might grow to include the
things don't fit well anywhere else. To me, then pv_lazy_ops (with just
.set_mode) might be better.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
 Yes. For the native, safe_halt is sti; hlt. The native_halt is
 just hlt. So the para_virt part of hlt could be moved to pv_cpu_ops,
 and the sti part stays in pv_irq_ops.
   

By sti part, you mean the full sti; hlt sequence of safe_halt,
right?  Since it needs to be an atomic sequence to avoid race
conditions, so the native sequence has to be precisely sti; hlt to
take advantage of the sti shadow, and other pv-backends will need their
own way to guarantee this atomicity.

But I'm quite happy to put plain hlt into cpu_ops as halt_cpu() or
something (and perhaps rename safe_halt to something a bit more
descriptive).

 Actually my concern was that such misc ops might grow to include the
 things don't fit well anywhere else. To me, then pv_lazy_ops (with just
 .set_mode) might be better.
   

The lazy interface has needed a rethink anyway.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/