On Mon, 9 May 2011 16:21:15 +0200 Marko Zec wrote:

 MZ> On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote:
 >> Hi,
 >> Trying ipfw_nat under VIMAGE kernel I got this panic on the module load:

 MZ> Hi,

 MZ> I think the problem here is that curvnet context is not set properly on 
 MZ> to ipfw_nat_modevent().  The canonical way to initialize VNET-enabled 
 MZ> subsystems is to trigger them using VNET_SYSINIT() macros (instead of 
 MZ> modevent mechanisms), which in turn ensure that:

 MZ> a) that the initializer function gets invoked for each existing vnet
 MZ> b) curvnet context is set properly on entry to initializer functions and

hm, sorry, but I don't see how curvnet context might help here. For me this
does not look like curvnet context problem or my understanding how it works
completely wrong.

Below is kgdb session on live VIMAGE system with ipfw.ko loaded.

Let's look at some kernel virtualized variable:

(kgdb) p vnet_entry_ifnet
$1 = {tqh_first = 0x0, tqh_last = 0x0}
(kgdb) p &vnet_entry_ifnet
$2 = (struct ifnethead *) 0x8102d488

As expected the address is in kernel 'set_vnet':

kopusha:/usr/src/sys% kldstat |grep kernel
 1   69 0x80400000 1092700  kernel
kopusha:/usr/src/sys% nm /boot/kernel/kernel |grep  __start_set_vnet 
8102d480 A __start_set_vnet

default vnet:

(kgdb) p vnet0
$3 = (struct vnet *) 0x86d9b000

Calculate ifnet location on vnet0 (a la VNET_VNET(vnet0, ifnet)):

(kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & vnet_entry_ifnet 

Access it:

(kgdb) p *((struct ifnethead *)0x86d9c008)
$4 = {tqh_first = 0x86da5c00, tqh_last = 0x89489c0c}
(kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_dname
$7 = 0x80e8b480 "usbus"
(kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_vnet 
$8 = (struct vnet *) 0x86d9b000

Everything looks good. Now try the same with virtualized variable layer3_chain
from ipfw module:

(kgdb) p vnet_entry_layer3_chain
$9 = {rules = 0x0, reap = 0x0, default_rule = 0x0, n_rules = 0, static_len = 0, 
map = 0x0, 
  nat = {lh_first = 0x0}, tables = {0x0 <repeats 128 times>}, rwmtx = 
{lock_object = {
      lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 
0}, uh_lock = {
    lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 0x0}, 
rw_lock = 0}, 
  id = 0, gencnt = 0}

"master" variable looks good (initialized to zeros), what about its address?

(kgdb) p &vnet_entry_layer3_chain
$10 = (struct ip_fw_chain *) 0x894a5c00

It points to 'set_vnet' of the ipfw.ko:

kopusha# kldstat |grep ipfw.ko
13    2 0x89495000 11000    ipfw.ko
kopusha:/usr/src/sys% nm /boot/kernel/ipfw.ko |grep  __start_set_vnet  
00010be0 A __start_set_vnet
kopusha:/usr/src/sys% printf "0x%x\n" $((0x89495000 + 0x00010be0))

Calculate layer3_chain location on vnet0 (a la VNET_VNET(vnet0, layer3_chain)):

(kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) & 

Try to read it:

(kgdb) p ((struct ip_fw_chain *)0x8f214780)->rwmtx
$13 = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness = 
0x0}, rw_lock = 0}
(kgdb) p ((struct ip_fw_chain *)0x8f214780)->rules
$14 = (struct ip_fw *) 0x6

Data looks wrong. But this is the way how this variable is acessed by
ipfw_nat. I see the same in the crash image:

(kgdb) where
#11 0xc09a4882 in _rw_wlock (rw=0xc6d5e91c, 
"/usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c", line=547)
    at /usr/src/sys/kern/kern_rwlock.c:238
#12 0xca0ab841 in ipfw_nat_modevent (mod=0xc98a48c0, type=0, unused=0x0)
    at /usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c:547

note, rw=0xc6d5e91c (it crashed on it). And I get the same address doing like I 
did above:

(kgdb) VNET_VNET vnet0 vnet_entry_layer3_chain
at 0xc6d5e700 of type = struct ip_fw_chain
(kgdb) p &((struct ip_fw_chain *)0xc6d5e700)->rwmtx
$8 = (struct rwlock *) 0xc6d5e91c

Thus ipfw_nat was in vnet0 context then. I saw crashes (in other modules) when
the context was not initialised and they looked differently.

Right location was 0x86d9c160 (found adding print to ipfw module, I don't know
easier way):

(kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rwmtx
$1 = {lock_object = {lo_name = 0x932ba4b3 "IPFW static rules", lo_flags = 
    lo_data = 0, lo_witness = 0x86d6ab30}, rw_lock = 1}
(kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rules
$2 = (struct ip_fw *) 0x8f2d1c80

So I don't see a way how to reach module's virtualized variable from outside
the module even if you are in the right vnet context. The linker, when loading
the module and allocating the variable on vnet stacks in 'modspace' possesses
this information and it reallocates addresses in the module and they are
accessible from inside the module, but not from outside.

 MZ> Cheers,

 MZ> Marko

 >> Fatal trap 12: page fault while in kernel mode
 >> cpuid = 1; apic id = 01
 >> fault virtual address   = 0x4
 >> fault code              = supervisor read, page not present
 >> instruction pointer     = 0x20:0xc09f098e
 >> stack pointer           = 0x28:0xf563b944
 >> frame pointer           = 0x28:0xf563b998
 >> code segment            = base 0x0, limit 0xfffff, type 0x1b
 >>                         = DPL 0, pres 1, def32 1, gran 1
 >> processor eflags        = interrupt enabled, resume, IOPL = 0
 >> current process         = 4264 (kldload)
 >> witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at
 >> witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...) at
 >> _rw_wlock+0x82
 >> ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41
 >> module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at
 >> module_register_init+0xa7
 >> linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at
 >> linker_load_module+0xa05
 >> kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at
 >> kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...) at
 >> kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...) at
 >> syscallenter+0x263 syscall(f563bd28) at syscall+0x34
 >> Xint0x80_syscall() at Xint0x80_syscall+0x21
 >> --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp =
 >> 0xbfbfe79c, ebp = 0xbfbfec88 -
 >> It crashed on acessing data from virtualized global variable V_layer3_chain
 >> in ipfw_nat_modevent(). V_layer3_chain is defined in ipfw module and it
 >> turns out that &V_layer3_chain returns wrong location from anywhere but
 >> ipfw.ko.
 >> May be this is a known issue, but I have not found info about this, so
 >> below are details of investigation why this happens.
 >> Virtualized global variables are defined using the VNET_DEFINE() macro,
 >> which places them in the 'set_vnet' linker set (in the base kernel or in
 >> module). This is used to
 >> 1) copy these "default" values to each virtual network stack instance when
 >> created;
 >> 2) act as unique global names by which the variable can be referred to. The
 >> location of a per-virtual instance variable is calculated at run-time like
 >> in the example below for layer3_chain variable in the default vnet (vnet0):
 >> vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain               
 >>  (1)
 >> For modules the thing is more complicated. When a module is loaded its
 >> global variables from 'set_vnet' linker set are copied to the kernel
 >> 'set_vnet', and for module to be able to access them the linker reallocates
 >> all references accordingly (kern/link_elf.c:elf_relocaddr()):
 >>         if (x >= ef->vnet_start && x < ef->vnet_stop)
 >>                 return ((x - ef->vnet_start) + ef->vnet_base);
 >> So from inside the module the access to its virtualized variables works,
 >> but from the outside we get wrong location using calculation like above
 >> (1), because &vnet_entry_layer3_chain returns address of the variable in
 >> the module's 'set_vnet'.
 >> The workaround is to compile such modules into the kernel or use a hack I
 >> have done for ipfw_nat -- add the function to ipfw module which returns the
 >> location of virtualized layer3_chain variable and use this location instead
 >> of V_layer3_chain macro (see the attached patch).
 >> But I suppose the problem is not a new and there might be better approach
 >> already invented to deal with this?

Mikolaj Golub
freebsd-virtualization@freebsd.org mailing list
To unsubscribe, send any mail to 

Reply via email to