Re: What's going on with vnets and epairs w/ addresses?

2023-01-17 Thread Mark Johnston
On Tue, Dec 20, 2022 at 08:50:09PM +, Bjoern A. Zeeb wrote:
> On Tue, 20 Dec 2022, Mark Johnston wrote:
> 
> > On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote:
> >> On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff  wrote:
> >>>
> >>>   Zhenlei,
> >>>
> >>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
> >>> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
> >>> Z>
> >>> Z> ---
> >>> Z> #!/bin/sh
> >>> Z>
> >>> Z> # test jail name
> >>> Z> n="test_ref_leak"
> >>> Z>
> >>> Z> jail -c name=$n path=/ vnet persist
> >>> Z> # The following line trigger jail pr_ref leak
> >>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
> >>> Z>
> >>> Z> jail -R $n
> >>> Z>
> >>> Z> # wait a moment
> >>> Z> sleep 1
> >>> Z>
> >>> Z> jls -j $n
> >>> Z>
> >>> Z> After DDB debugging and tracing , it seems that is triggered by a 
> >>> combine of [1] and [2]
> >>> Z>
> >>> Z> [1] 
> >>> https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
> >>> 
> >>> Z> [2] 
> >>> https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
> >>> 
> >>> Z>
> >>> Z>
> >>> Z> In [1] the per-VNET uma zone is shared with the global one.
> >>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
> >>> Z>
> >>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
> >>> uma_zfree_smr() .
> >>> Z>
> >>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and 
> >>> inpcb_dtor() is not called immediately ,
> >>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
> >>> Z>
> >>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
> >>> tcp_destroy / udp_destroy / rip_destroy.
> >>>
> >>> This is known issue and I'd prefer not to call it a problem. The "leak" 
> >>> of a jail
> >>> happens only if machine is idle wrt the networking activity.
> >>>
> >>> Getting back to the problem that started this thread - the epair(4)s not 
> >>> immediately
> >>> popping back to prison0. IMHO, the problem again lies in the design of 
> >>> if_vmove and
> >>> epair(4) in particular. The if_vmove shall not exist, instead we should 
> >>> do a full
> >>> if_attach() and if_detach(). The state of an ifnet when it undergoes 
> >>> if_vmove doesn't
> >>> carry any useful information. With Alexander melifaro@ we discussed 
> >>> better options
> >>> for creating or attaching interfaces to jails that if_vmove. Until they 
> >>> are ready
> >>> the most easy workaround to deal with annoying epair(4) come back problem 
> >>> is to
> >>> remove it manually before destroying a jail, like I did in 80fc25025ff.
> >>>
> >>
> >> It still behaved much better prior to eb93b99d6986, which you and Mark
> >> were going to work on a solution for to allow the cred "leak" to close
> >> up much more quickly. CC markj@, since I think it's been six months
> >> since the last time I inquired about it, making this a good time to do
> >> it again...
> >
> > I spent some time trying to see if we could fix this in UMA/SMR and
> > talked to Jeff about it a bit.  At this point I don't think it's the
> > right approach, at least for now.  Really we have a composability
> > problem where different layers are using different techniques to signal
> > that they're done with a particular piece of memory, and they just
> > aren't compatible.
> >
> > One thing I tried is to implement a UMA function which walks over all
> > SMR zones and synchronizes all cached items (so that their destructors
> > are called).  This is really expensive, at minimum it has to bind to all
> 
> A semi-unrelated question -- do we have any documentation around SMR
> in the tree which is not in subr_smr.c?
> 
> (I have to admit I find it highly confusing that the acronym is more
> easily found as "Shingled Magnetic Recording (SMR)" in a different
> header file).

Sorry for the delayed reply, I was travelling for a few weeks and still
haven't caught up.  I did at least write a man page which notes the
multiple meanings of that acronym. :)

Comments and feedback are welcome: https://reviews.freebsd.org/D38108



Re: What's going on with vnets and epairs w/ addresses?

2023-01-02 Thread Zhenlei Huang
Hi,

Happy New Year 2023!

> On Dec 27, 2022, at 4:42 AM, Gleb Smirnoff  wrote:
> 
> Zhenlei, Bjoern, Mark,
> 
> sorry for delayed response on this thread. Back when the problem
> was first introduced, I made a code that forces purge of SMR zones.
> However, I didn't push it in, hence the change on the test suite side
> to remove interfaces from inside the jail before destroying it was
> sufficient to close all leaks associated with the test suite.
> 
> I just rebased the code to fresh main and put it here:
> 
> https://github.com/glebius/FreeBSD/tree/smr-purge
> 
> The proof of concept based on the test from Zhenlei looks like this:
> 
> #!/bin/sh
> n="test_ref_leak"
> 
> jail -c name=$n path=/ vnet persist
> # The following line trigger jail pr_ref leak
> jexec $n ifconfig lo0 inet 127.0.0.1/8
> 
> jail -R $n
> 
> for zone in tcp_inpcb udp_inpcb; do
>   sysctl vm.uma_zone_reclaim=${zone}
> done
> 
> jls -j $n
> 
> At the point of the call to jls(8) the jail no longer exists.
> 
> My opinion on the whole problem matches Mark's opinion, that he expressed
> in his email on December 20.  I like the idea of doing the prison
> checks at a later stage of inpcb lookup, especially given new discoveries
> on the performance impact by Drew.  The proper fix may take a while.
> 
> In addition to that I have strong opinion against the way we move interfaces
> between the jails. I claim that if did it right (tm), the problem we are
> talking about won't exist even with all the existing layering violations
> between inpcb+smr and jails+epoch. I will write a longer email on what I
> believe is the right (tm) way to manage interfaces/devices within jails.
> We already have had discussions on that with Alexander melifaro@ and Warner
> imp@.  However, proper implementation will take a while.
> 
> We may use code from my smr-purge branch as a temporary solution. Any
> thoughts on that?

The code in smr-purge branch should also apply to non-vnet jails.
I think it is OK as a temporary solution.

> 
> -- 
> Gleb Smirnoff




Re: What's going on with vnets and epairs w/ addresses?

2022-12-26 Thread Gleb Smirnoff
  Zhenlei, Bjoern, Mark,

sorry for delayed response on this thread. Back when the problem
was first introduced, I made a code that forces purge of SMR zones.
However, I didn't push it in, hence the change on the test suite side
to remove interfaces from inside the jail before destroying it was
sufficient to close all leaks associated with the test suite.

I just rebased the code to fresh main and put it here:

https://github.com/glebius/FreeBSD/tree/smr-purge

The proof of concept based on the test from Zhenlei looks like this:

#!/bin/sh
n="test_ref_leak"

jail -c name=$n path=/ vnet persist
# The following line trigger jail pr_ref leak
jexec $n ifconfig lo0 inet 127.0.0.1/8

jail -R $n

for zone in tcp_inpcb udp_inpcb; do
sysctl vm.uma_zone_reclaim=${zone}
done

jls -j $n

At the point of the call to jls(8) the jail no longer exists.

My opinion on the whole problem matches Mark's opinion, that he expressed
in his email on December 20.  I like the idea of doing the prison
checks at a later stage of inpcb lookup, especially given new discoveries
on the performance impact by Drew.  The proper fix may take a while.

In addition to that I have strong opinion against the way we move interfaces
between the jails. I claim that if did it right (tm), the problem we are
talking about won't exist even with all the existing layering violations
between inpcb+smr and jails+epoch. I will write a longer email on what I
believe is the right (tm) way to manage interfaces/devices within jails.
We already have had discussions on that with Alexander melifaro@ and Warner
imp@.  However, proper implementation will take a while.

We may use code from my smr-purge branch as a temporary solution. Any
thoughts on that?

-- 
Gleb Smirnoff



Re: What's going on with vnets and epairs w/ addresses?

2022-12-22 Thread Gleb Smirnoff
On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote:
K> It still behaved much better prior to eb93b99d6986, which you and Mark
K> were going to work on a solution for to allow the cred "leak" to close
K> up much more quickly. CC markj@, since I think it's been six months
K> since the last time I inquired about it, making this a good time to do
K> it again...

I had a branch with a KPI to force purge SMR zone. That could be triggered
by a sysctl or automatically from the jail code. I will look for it and
share.

Sorry for 4 day delay on this (and other topics), I've been mostly offline.
I will reply to all mails on this thread before Monday.

-- 
Gleb Smirnoff



Re: What's going on with vnets and epairs w/ addresses?

2022-12-22 Thread Zhenlei Huang
> 
> On Dec 21, 2022, at 12:12 AM, Mark Johnston  > wrote:
> 
> On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote:
>> On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff > > wrote:
>>> 
>>>  Zhenlei,
>>> 
>>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
>>> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
>>> Z>
>>> Z> ---
>>> Z> #!/bin/sh
>>> Z>
>>> Z> # test jail name
>>> Z> n="test_ref_leak"
>>> Z>
>>> Z> jail -c name=$n path=/ vnet persist
>>> Z> # The following line trigger jail pr_ref leak
>>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
>>> Z>
>>> Z> jail -R $n
>>> Z>
>>> Z> # wait a moment
>>> Z> sleep 1
>>> Z>
>>> Z> jls -j $n
>>> Z>
>>> Z> After DDB debugging and tracing , it seems that is triggered by a 
>>> combine of [1] and [2]
>>> Z>
>>> Z> [1] 
>>> https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
>>>  
>>> >> >
>>> Z> [2] 
>>> https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
>>>  
>>> >> >
>>> Z>
>>> Z>
>>> Z> In [1] the per-VNET uma zone is shared with the global one.
>>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
>>> Z>
>>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
>>> uma_zfree_smr() .
>>> Z>
>>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() 
>>> is not called immediately ,
>>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>>> Z>
>>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
>>> tcp_destroy / udp_destroy / rip_destroy.
>>> 
>>> This is known issue and I'd prefer not to call it a problem. The "leak" of 
>>> a jail
>>> happens only if machine is idle wrt the networking activity.
>>> 
>>> Getting back to the problem that started this thread - the epair(4)s not 
>>> immediately
>>> popping back to prison0. IMHO, the problem again lies in the design of 
>>> if_vmove and
>>> epair(4) in particular. The if_vmove shall not exist, instead we should do 
>>> a full
>>> if_attach() and if_detach(). The state of an ifnet when it undergoes 
>>> if_vmove doesn't
>>> carry any useful information. With Alexander melifaro@ we discussed better 
>>> options
>>> for creating or attaching interfaces to jails that if_vmove. Until they are 
>>> ready
>>> the most easy workaround to deal with annoying epair(4) come back problem 
>>> is to
>>> remove it manually before destroying a jail, like I did in 80fc25025ff.
>>> 
>> 
>> It still behaved much better prior to eb93b99d6986, which you and Mark
>> were going to work on a solution for to allow the cred "leak" to close
>> up much more quickly. CC markj@, since I think it's been six months
>> since the last time I inquired about it, making this a good time to do
>> it again...
> 
> I spent some time trying to see if we could fix this in UMA/SMR and
> talked to Jeff about it a bit.  At this point I don't think it's the
> right approach, at least for now.  Really we have a composability
> problem where different layers are using different techniques to signal
> that they're done with a particular piece of memory, and they just
> aren't compatible.

I originally thought that `uma_free_smr()` is somewhat like `epoch_call()` with
an internal `epoch_callback_t`, but after digging into the source code it is 
not true.
`uma_free_smr()` put the item into cache and until next allocation from the
cache the destructor get a chance to run.

Can SMR provide some mean just like `epoch_callback_t` , so that the destructors
eventually get been invoked ?

> 
> One thing I tried is to implement a UMA function which walks over all
> SMR zones and synchronizes all cached items (so that their destructors
> are called).  This is really expensive, at minimum it has to bind to all
> CPUs in the system so that it can flush per-CPU buckets.  If
> jail_deref() calls that function, the bug goes away at least in my
> limited testing, but its use is really a layering violation.

I've proposed a `vnet_shutdown()` stage in another mail. Maybe we can introduce
a `vnet_cleanup()`  and INPCB layer register to listen a `cleanup` event and
the function which synchronizing cached items get been invoked.
Is that still a layering violation?

> 
> We could, say, periodically scan cached UMA/SMR items and invoke their
> destructors, but for most SMR consumers this is unnecessary, and again
> there's a layering problem: the inpcb layer shouldn't "know" that it has
> to do that for its zones, 

Re: What's going on with vnets and epairs w/ addresses?

2022-12-20 Thread Bjoern A. Zeeb

On Tue, 20 Dec 2022, Mark Johnston wrote:


On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote:

On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff  wrote:


  Zhenlei,

On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
Z> I managed to repeat this issue on CURRENT/14 with this small snip:
Z>
Z> ---
Z> #!/bin/sh
Z>
Z> # test jail name
Z> n="test_ref_leak"
Z>
Z> jail -c name=$n path=/ vnet persist
Z> # The following line trigger jail pr_ref leak
Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
Z>
Z> jail -R $n
Z>
Z> # wait a moment
Z> sleep 1
Z>
Z> jls -j $n
Z>
Z> After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]
Z>
Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 

Z>
Z>
Z> In [1] the per-VNET uma zone is shared with the global one.
Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
Z>
Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .
Z>
Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
not called immediately ,
Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
Z>
Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.

This is known issue and I'd prefer not to call it a problem. The "leak" of a 
jail
happens only if machine is idle wrt the networking activity.

Getting back to the problem that started this thread - the epair(4)s not 
immediately
popping back to prison0. IMHO, the problem again lies in the design of if_vmove 
and
epair(4) in particular. The if_vmove shall not exist, instead we should do a 
full
if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove 
doesn't
carry any useful information. With Alexander melifaro@ we discussed better 
options
for creating or attaching interfaces to jails that if_vmove. Until they are 
ready
the most easy workaround to deal with annoying epair(4) come back problem is to
remove it manually before destroying a jail, like I did in 80fc25025ff.



It still behaved much better prior to eb93b99d6986, which you and Mark
were going to work on a solution for to allow the cred "leak" to close
up much more quickly. CC markj@, since I think it's been six months
since the last time I inquired about it, making this a good time to do
it again...


I spent some time trying to see if we could fix this in UMA/SMR and
talked to Jeff about it a bit.  At this point I don't think it's the
right approach, at least for now.  Really we have a composability
problem where different layers are using different techniques to signal
that they're done with a particular piece of memory, and they just
aren't compatible.

One thing I tried is to implement a UMA function which walks over all
SMR zones and synchronizes all cached items (so that their destructors
are called).  This is really expensive, at minimum it has to bind to all


A semi-unrelated question -- do we have any documentation around SMR
in the tree which is not in subr_smr.c?

(I have to admit I find it highly confusing that the acronym is more
easily found as "Shingled Magnetic Recording (SMR)" in a different
header file).



CPUs in the system so that it can flush per-CPU buckets.  If
jail_deref() calls that function, the bug goes away at least in my
limited testing, but its use is really a layering violation.

We could, say, periodically scan cached UMA/SMR items and invoke their
destructors, but for most SMR consumers this is unnecessary, and again
there's a layering problem: the inpcb layer shouldn't "know" that it has
to do that for its zones, since it's the jail layer that actually cares.

It also seems kind of strange that dying jails still occupy a slot in
the jail namespace.  I don't really understand why the existence of a
dying jail prevents creation of a new jail with the same name, but
presumably there's a good reason for it?


You can create a new jail but if you have (physical) resources tied to
the old one which are not released, then you are stuck (physical
network interfaces for example).



Now my inclination is to try and fix this in the inpcb layer, by not
accessing the inp_cred at all in the lookup path until we hold the inpcb
lock, and then releasing the cred ref before freeing a PCB to its zone.
I think this is doable based on a few observations:
- When doing an SMR-protected lookup, we always lock the returned inpcb
 before handing it to the caller.  So we could in principle perform
 inp_cred checking after acquiring the lock but before returning.
- If there are no jailed PCBs in a hash chain in_pcblookup_hash_locked()
 always scans the whole chain.
- If we match only one PCB in a lookup, we can probably(?) 

Re: What's going on with vnets and epairs w/ addresses?

2022-12-20 Thread Mark Johnston
On Sun, Dec 18, 2022 at 10:52:58AM -0600, Kyle Evans wrote:
> On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff  wrote:
> >
> >   Zhenlei,
> >
> > On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
> > Z> I managed to repeat this issue on CURRENT/14 with this small snip:
> > Z>
> > Z> ---
> > Z> #!/bin/sh
> > Z>
> > Z> # test jail name
> > Z> n="test_ref_leak"
> > Z>
> > Z> jail -c name=$n path=/ vnet persist
> > Z> # The following line trigger jail pr_ref leak
> > Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
> > Z>
> > Z> jail -R $n
> > Z>
> > Z> # wait a moment
> > Z> sleep 1
> > Z>
> > Z> jls -j $n
> > Z>
> > Z> After DDB debugging and tracing , it seems that is triggered by a 
> > combine of [1] and [2]
> > Z>
> > Z> [1] 
> > https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
> > 
> > Z> [2] 
> > https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
> > 
> > Z>
> > Z>
> > Z> In [1] the per-VNET uma zone is shared with the global one.
> > Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
> > Z>
> > Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
> > uma_zfree_smr() .
> > Z>
> > Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() 
> > is not called immediately ,
> > Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
> > Z>
> > Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
> > tcp_destroy / udp_destroy / rip_destroy.
> >
> > This is known issue and I'd prefer not to call it a problem. The "leak" of 
> > a jail
> > happens only if machine is idle wrt the networking activity.
> >
> > Getting back to the problem that started this thread - the epair(4)s not 
> > immediately
> > popping back to prison0. IMHO, the problem again lies in the design of 
> > if_vmove and
> > epair(4) in particular. The if_vmove shall not exist, instead we should do 
> > a full
> > if_attach() and if_detach(). The state of an ifnet when it undergoes 
> > if_vmove doesn't
> > carry any useful information. With Alexander melifaro@ we discussed better 
> > options
> > for creating or attaching interfaces to jails that if_vmove. Until they are 
> > ready
> > the most easy workaround to deal with annoying epair(4) come back problem 
> > is to
> > remove it manually before destroying a jail, like I did in 80fc25025ff.
> >
> 
> It still behaved much better prior to eb93b99d6986, which you and Mark
> were going to work on a solution for to allow the cred "leak" to close
> up much more quickly. CC markj@, since I think it's been six months
> since the last time I inquired about it, making this a good time to do
> it again...

I spent some time trying to see if we could fix this in UMA/SMR and
talked to Jeff about it a bit.  At this point I don't think it's the
right approach, at least for now.  Really we have a composability
problem where different layers are using different techniques to signal
that they're done with a particular piece of memory, and they just
aren't compatible.

One thing I tried is to implement a UMA function which walks over all
SMR zones and synchronizes all cached items (so that their destructors
are called).  This is really expensive, at minimum it has to bind to all
CPUs in the system so that it can flush per-CPU buckets.  If
jail_deref() calls that function, the bug goes away at least in my
limited testing, but its use is really a layering violation.

We could, say, periodically scan cached UMA/SMR items and invoke their
destructors, but for most SMR consumers this is unnecessary, and again
there's a layering problem: the inpcb layer shouldn't "know" that it has
to do that for its zones, since it's the jail layer that actually cares.

It also seems kind of strange that dying jails still occupy a slot in
the jail namespace.  I don't really understand why the existence of a
dying jail prevents creation of a new jail with the same name, but
presumably there's a good reason for it?

Now my inclination is to try and fix this in the inpcb layer, by not
accessing the inp_cred at all in the lookup path until we hold the inpcb
lock, and then releasing the cred ref before freeing a PCB to its zone.
I think this is doable based on a few observations:
- When doing an SMR-protected lookup, we always lock the returned inpcb
  before handing it to the caller.  So we could in principle perform
  inp_cred checking after acquiring the lock but before returning.
- If there are no jailed PCBs in a hash chain in_pcblookup_hash_locked()
  always scans the whole chain.
- If we match only one PCB in a lookup, we can probably(?) return that
  PCB without dereferencing the cred pointer at all.  If not, then the
  scan only has to keep track of a fixed number of PCBs before picking
  which one to return.  So it looks like we can perform 

Re: What's going on with vnets and epairs w/ addresses?

2022-12-18 Thread Kyle Evans
On Sat, Dec 17, 2022 at 11:22 AM Gleb Smirnoff  wrote:
>
>   Zhenlei,
>
> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
> Z>
> Z> ---
> Z> #!/bin/sh
> Z>
> Z> # test jail name
> Z> n="test_ref_leak"
> Z>
> Z> jail -c name=$n path=/ vnet persist
> Z> # The following line trigger jail pr_ref leak
> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
> Z>
> Z> jail -R $n
> Z>
> Z> # wait a moment
> Z> sleep 1
> Z>
> Z> jls -j $n
> Z>
> Z> After DDB debugging and tracing , it seems that is triggered by a combine 
> of [1] and [2]
> Z>
> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
> 
> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
> 
> Z>
> Z>
> Z> In [1] the per-VNET uma zone is shared with the global one.
> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
> Z>
> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
> uma_zfree_smr() .
> Z>
> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
> not called immediately ,
> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
> Z>
> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
> tcp_destroy / udp_destroy / rip_destroy.
>
> This is known issue and I'd prefer not to call it a problem. The "leak" of a 
> jail
> happens only if machine is idle wrt the networking activity.
>
> Getting back to the problem that started this thread - the epair(4)s not 
> immediately
> popping back to prison0. IMHO, the problem again lies in the design of 
> if_vmove and
> epair(4) in particular. The if_vmove shall not exist, instead we should do a 
> full
> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove 
> doesn't
> carry any useful information. With Alexander melifaro@ we discussed better 
> options
> for creating or attaching interfaces to jails that if_vmove. Until they are 
> ready
> the most easy workaround to deal with annoying epair(4) come back problem is 
> to
> remove it manually before destroying a jail, like I did in 80fc25025ff.
>

It still behaved much better prior to eb93b99d6986, which you and Mark
were going to work on a solution for to allow the cred "leak" to close
up much more quickly. CC markj@, since I think it's been six months
since the last time I inquired about it, making this a good time to do
it again...

Thanks,

Kyle Evans



Re: What's going on with vnets and epairs w/ addresses?

2022-12-18 Thread Bjoern A. Zeeb

On Sun, 18 Dec 2022, Zhenlei Huang wrote:




On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb  wrote:

On Sat, 17 Dec 2022, Gleb Smirnoff wrote:


Zhenlei,

On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
Z> I managed to repeat this issue on CURRENT/14 with this small snip:
Z>
Z> ---
Z> #!/bin/sh
Z>
Z> # test jail name
Z> n="test_ref_leak"
Z>
Z> jail -c name=$n path=/ vnet persist
Z> # The following line trigger jail pr_ref leak
Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
Z>
Z> jail -R $n
Z>
Z> # wait a moment
Z> sleep 1
Z>
Z> jls -j $n
Z>
Z> After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]
Z>
Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 



I can confirm [2] also affects Non-VNET jails.
Prison pr_ref leak cause jail stuck in dying state.


Usually a TCP connection in TW would do this in the old days and things
would solve themselves after a while.  This was always the case even
long before vnet or multi-IP jails.



Z>
Z>
Z> In [1] the per-VNET uma zone is shared with the global one.
Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
Z>
Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .
Z>
Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
not called immediately ,
Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
Z>
Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.

This is known issue and I'd prefer not to call it a problem. The "leak" of a 
jail
happens only if machine is idle wrt the networking activity.

Getting back to the problem that started this thread - the epair(4)s not 
immediately
popping back to prison0. IMHO, the problem again lies in the design of if_vmove 
and
epair(4) in particular. The if_vmove shall not exist, instead we should do a 
full
if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove 
doesn't
carry any useful information. With Alexander melifaro@ we discussed better 
options
for creating or attaching interfaces to jails that if_vmove. Until they are 
ready
the most easy workaround to deal with annoying epair(4) come back problem is to
remove it manually before destroying a jail, like I did in 80fc25025ff.


Ok, move an em0 or cxl0 into the jail;  the problem will be the same I
bet and you need the physical interface to not disappear as then you
cannot re-create a new jail with it.


Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no chance to 
be called, thus
if_vmove is not called and epair(4)s or em0, exl0 are not returned to home vnet.

That can be confirmed by setting debug point on vnet_destroy by DDB, and then 
create and destroy vnet jails.

So before the problem prison pr_ref count leaks is resolved, it will cover 
other potential problems such as @glebius
pointed out.

I think the problem that prison ref count leaks should be resolved first.

I'm also reviewing the life cycles of prison / vnet and it seems they could 
still be improved.


But that's the not the problem here as your own test case pointed out.

The point is that if you start a plain vnet jail put an interface in and
destroy the jail that works instantly.
The moment you put an address on any interface (incl. loopback as your
test showed, which will not do ARP/NDP things compared to an ethernet
interface) the jail will no longer die immediately.

Simply putting an address on an interface should not defer things.
So indeed something holds onto things there and is not cleaned up
anymore.  Finding that "something" is the important bit and being able
to clean it up.

I always say, if you have a machine in shutdown -r you don't want it
hanging for hours either (now if you toggle the power switch you can do
a lot more without panicing the rest of the system but with jails we
cannot do that).  And we did have vnet jails shutting down preoperly and
clearing up for years.  People had spent a lot of time on that.  So it is
possible and we need to get back to that state.

/bz

--
Bjoern A. Zeeb r15:7



Re: What's going on with vnets and epairs w/ addresses?

2022-12-17 Thread Zhenlei Huang


> On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb  wrote:
> 
> On Sat, 17 Dec 2022, Gleb Smirnoff wrote:
> 
>> Zhenlei,
>> 
>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
>> Z> I managed to repeat this issue on CURRENT/14 with this small snip:
>> Z>
>> Z> ---
>> Z> #!/bin/sh
>> Z>
>> Z> # test jail name
>> Z> n="test_ref_leak"
>> Z>
>> Z> jail -c name=$n path=/ vnet persist
>> Z> # The following line trigger jail pr_ref leak
>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
>> Z>
>> Z> jail -R $n
>> Z>
>> Z> # wait a moment
>> Z> sleep 1
>> Z>
>> Z> jls -j $n
>> Z>
>> Z> After DDB debugging and tracing , it seems that is triggered by a combine 
>> of [1] and [2]
>> Z>
>> Z> [1] 
>> https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
>> 
>> Z> [2] 
>> https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
>> 

I can confirm [2] also affects Non-VNET jails.
Prison pr_ref leak cause jail stuck in dying state.

>> Z>
>> Z>
>> Z> In [1] the per-VNET uma zone is shared with the global one.
>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
>> Z>
>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
>> uma_zfree_smr() .
>> Z>
>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() 
>> is not called immediately ,
>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>> Z>
>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
>> tcp_destroy / udp_destroy / rip_destroy.
>> 
>> This is known issue and I'd prefer not to call it a problem. The "leak" of a 
>> jail
>> happens only if machine is idle wrt the networking activity.
>> 
>> Getting back to the problem that started this thread - the epair(4)s not 
>> immediately
>> popping back to prison0. IMHO, the problem again lies in the design of 
>> if_vmove and
>> epair(4) in particular. The if_vmove shall not exist, instead we should do a 
>> full
>> if_attach() and if_detach(). The state of an ifnet when it undergoes 
>> if_vmove doesn't
>> carry any useful information. With Alexander melifaro@ we discussed better 
>> options
>> for creating or attaching interfaces to jails that if_vmove. Until they are 
>> ready
>> the most easy workaround to deal with annoying epair(4) come back problem is 
>> to
>> remove it manually before destroying a jail, like I did in 80fc25025ff.
> 
> Ok, move an em0 or cxl0 into the jail;  the problem will be the same I
> bet and you need the physical interface to not disappear as then you
> cannot re-create a new jail with it.

Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no chance to 
be called, thus
if_vmove is not called and epair(4)s or em0, exl0 are not returned to home vnet.

That can be confirmed by setting debug point on vnet_destroy by DDB, and then 
create and destroy vnet jails.

So before the problem prison pr_ref count leaks is resolved, it will cover 
other potential problems such as @glebius
pointed out.

I think the problem that prison ref count leaks should be resolved first.

I'm also reviewing the life cycles of prison / vnet and it seems they could 
still be improved.

> 
> /bz
> 
> -- 
> Bjoern A. Zeeb r15:7




Re: What's going on with vnets and epairs w/ addresses?

2022-12-17 Thread Bjoern A. Zeeb

On Sat, 17 Dec 2022, Gleb Smirnoff wrote:


 Zhenlei,

On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
Z> I managed to repeat this issue on CURRENT/14 with this small snip:
Z>
Z> ---
Z> #!/bin/sh
Z>
Z> # test jail name
Z> n="test_ref_leak"
Z>
Z> jail -c name=$n path=/ vnet persist
Z> # The following line trigger jail pr_ref leak
Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
Z>
Z> jail -R $n
Z>
Z> # wait a moment
Z> sleep 1
Z>
Z> jls -j $n
Z>
Z> After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]
Z>
Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 

Z>
Z>
Z> In [1] the per-VNET uma zone is shared with the global one.
Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
Z>
Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .
Z>
Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
not called immediately ,
Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
Z>
Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.

This is known issue and I'd prefer not to call it a problem. The "leak" of a 
jail
happens only if machine is idle wrt the networking activity.

Getting back to the problem that started this thread - the epair(4)s not 
immediately
popping back to prison0. IMHO, the problem again lies in the design of if_vmove 
and
epair(4) in particular. The if_vmove shall not exist, instead we should do a 
full
if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove 
doesn't
carry any useful information. With Alexander melifaro@ we discussed better 
options
for creating or attaching interfaces to jails that if_vmove. Until they are 
ready
the most easy workaround to deal with annoying epair(4) come back problem is to
remove it manually before destroying a jail, like I did in 80fc25025ff.


Ok, move an em0 or cxl0 into the jail;  the problem will be the same I
bet and you need the physical interface to not disappear as then you
cannot re-create a new jail with it.

/bz

--
Bjoern A. Zeeb r15:7



Re: What's going on with vnets and epairs w/ addresses?

2022-12-17 Thread Gleb Smirnoff
  Zhenlei,

On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote:
Z> I managed to repeat this issue on CURRENT/14 with this small snip:
Z> 
Z> ---
Z> #!/bin/sh
Z> 
Z> # test jail name
Z> n="test_ref_leak"
Z> 
Z> jail -c name=$n path=/ vnet persist
Z> # The following line trigger jail pr_ref leak
Z> jexec $n ifconfig lo0 inet 127.0.0.1/8
Z> 
Z> jail -R $n
Z> 
Z> # wait a moment
Z> sleep 1
Z> 
Z> jls -j $n
Z> 
Z> After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]
Z> 
Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 

Z> 
Z> 
Z> In [1] the per-VNET uma zone is shared with the global one.
Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
Z> 
Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .
Z> 
Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
not called immediately ,
Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
Z> 
Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.

This is known issue and I'd prefer not to call it a problem. The "leak" of a 
jail
happens only if machine is idle wrt the networking activity.

Getting back to the problem that started this thread - the epair(4)s not 
immediately
popping back to prison0. IMHO, the problem again lies in the design of if_vmove 
and
epair(4) in particular. The if_vmove shall not exist, instead we should do a 
full
if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove 
doesn't
carry any useful information. With Alexander melifaro@ we discussed better 
options
for creating or attaching interfaces to jails that if_vmove. Until they are 
ready
the most easy workaround to deal with annoying epair(4) come back problem is to
remove it manually before destroying a jail, like I did in 80fc25025ff.

-- 
Gleb Smirnoff



Re: What's going on with vnets and epairs w/ addresses?

2022-12-17 Thread Zhenlei Huang

> On Dec 17, 2022, at 6:55 AM, Bjoern A. Zeeb  > wrote:
> 
> On Fri, 16 Dec 2022, Zhenlei Huang wrote:
> 
> Hi,
> 
>> I managed to repeat this issue on CURRENT/14 with this small snip:
>> 
>> ---
>> #!/bin/sh
>> 
>> # test jail name
>> n="test_ref_leak"
>> 
>> jail -c name=$n path=/ vnet persist
>> # The following line trigger jail pr_ref leak
>> jexec $n ifconfig lo0 inet 127.0.0.1/8
>> 
>> jail -R $n
>> 
>> # wait a moment
>> sleep 1
>> 
>> jls -j $n
>> 
>> 
>> ---
>> 
>> 
>> After DDB debugging and tracing , it seems that is triggered by a combine of 
>> [1] and [2]
>> 
>> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 
>> >  >
>> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 
>> >  >
>> 
>> 
>> In [1] the per-VNET uma zone is shared with the global one.
>> `pcbinfo->ipi_zone = pcbstor->ips_zone;`
>> 
>> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
>> uma_zfree_smr() .
>> 
>> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is 
>> not called immediately ,
>> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.
>> 
>> And it is also not possible to free up the cache by per-VNET SYSUNINIT 
>> tcp_destroy / udp_destroy / rip_destroy.
> 
> Thanks a lot for tracking it down.
> 
> That seems to be a regression then that needs to be fixed before
> 14.0-RELEASE will happen as it'll break management utilities of people.
> 
> Could you open a bug report and flag it as such?

While I was trying to open a new bug report Bugzilla prompt an existing PR 
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264981 
 
opened by  olivier. I think this issue is same with that one.

> 
> /bz
> 
> 
>> 
>> 
>> Best regards,
>> Zhenlei
>> 
>>> On Dec 14, 2022, at 9:56 AM, Zhenlei Huang >> > wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> I also encounter this problem while testing gif tunnel between jails.
>>> 
>>> My script is similar but with additional gif tunnels.
>>> 
>>> 
>>> There are reports in mailing list [1], [2], and another one in forum [3] .
>>> 
>>> Seem to be a long standing issue.
>>> 
>>> [1] 
>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html 
>>> >>  
>>> >
>>> [2] https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html 
>>>  
>>> >> >
>>> [3] 
>>> https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/
>>>  
>>> >>  
>>> >
>>> 
>>> 
>>> Best regards,
>>> Zhenlei
>>> 
 On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb >>>  >> wrote:
 
 Hi,
 
 I have used scripts like the below for almost a decade and a half
 (obviously doing more than that in the middle).  I haven't used them
 much lately but given other questions I just wanted to fire up a test.
 
 I have an end-November kernel doing the below my eapirs do not come back
 to be destroyed (immediately).
 I have to start polling for the jid to be no longer alive and not in
 dying state (hence added the jls/ifconfig -l lines and removed the
 error checking from ifconfig destroy).  That seems sometimes rather
 unreasonably long (to the point I give up).
 
 If I don't configure the addresses below this isn't a problem.
 
 Sorry I am confused by too many incarnations of the code; I know I once
 had a version with an async shutdown path but I believe that never made
 it into mainline, so why are we holding onto the epairs now and not
 nuking the addresses 

Re: What's going on with vnets and epairs w/ addresses?

2022-12-16 Thread Bjoern A. Zeeb

On Fri, 16 Dec 2022, Zhenlei Huang wrote:

Hi,


I managed to repeat this issue on CURRENT/14 with this small snip:

---
#!/bin/sh

# test jail name
n="test_ref_leak"

jail -c name=$n path=/ vnet persist
# The following line trigger jail pr_ref leak
jexec $n ifconfig lo0 inet 127.0.0.1/8

jail -R $n

# wait a moment
sleep 1

jls -j $n


---


After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]

[1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

[2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 



In [1] the per-VNET uma zone is shared with the global one.
`pcbinfo->ipi_zone = pcbstor->ips_zone;`

In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .

Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not 
called immediately ,
thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.

And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.


Thanks a lot for tracking it down.

That seems to be a regression then that needs to be fixed before
14.0-RELEASE will happen as it'll break management utilities of people.

Could you open a bug report and flag it as such?

/bz





Best regards,
Zhenlei


On Dec 14, 2022, at 9:56 AM, Zhenlei Huang  wrote:


Hi,

I also encounter this problem while testing gif tunnel between jails.

My script is similar but with additional gif tunnels.


There are reports in mailing list [1], [2], and another one in forum [3] .

Seem to be a long standing issue.

[1] https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html 

[2] https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html 

[3] 
https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/
 



Best regards,
Zhenlei


On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb mailto:b...@freebsd.org>> wrote:

Hi,

I have used scripts like the below for almost a decade and a half
(obviously doing more than that in the middle).  I haven't used them
much lately but given other questions I just wanted to fire up a test.

I have an end-November kernel doing the below my eapirs do not come back
to be destroyed (immediately).
I have to start polling for the jid to be no longer alive and not in
dying state (hence added the jls/ifconfig -l lines and removed the
error checking from ifconfig destroy).  That seems sometimes rather
unreasonably long (to the point I give up).

If I don't configure the addresses below this isn't a problem.

Sorry I am confused by too many incarnations of the code; I know I once
had a version with an async shutdown path but I believe that never made
it into mainline, so why are we holding onto the epairs now and not
nuking the addresses and returning them and are clean?

It's a bit more funny; I added a twiddle loop at the end and nothing
happened.  So I stop the script and start it again and suddenly another
jail or two have cleaned up and their epairs are back.  Something feels
very very wonky.  Play around with this and see ... and let me know if
you can reproduce this...  I quite wonder why some test cases haven't
gone crazy ...

/bz


#!/bin/sh

set -e
set -x

js=`jail -i -c -n jl host.hostname=left.example.net  
vnet persist`
jb=`jail -i -c -n jr host.hostname=right.example.net 
 vnet persist`

# Create an epair connecting the two machines (vnet jails).
ep=`ifconfig epair create | sed -e 's/a$//'`

# Add one end to each vnet jail.
ifconfig ${ep}a vnet ${js}
ifconfig ${ep}b vnet ${jb}

# Add an IP address on the epairs in each vnet jail.
# XXX Leave these out and the cleanup seems to work fine.
jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24

# Clean up.
jail -r ${jb}
jail -r ${js}

# You want to be able to remove this line ...
set +e

# No epairs to destroy with addresses configured; fine otherwise.
ifconfig ${ep}a destroy
# echo $?

# Add this is here only as things are funny ...
# jls -av jid dying
# ifconfig -l

# end


--
Bjoern A. Zeeb r15:7








--
Bjoern A. Zeeb r15:7



Re: What's going on with vnets and epairs w/ addresses?

2022-12-16 Thread Zhenlei Huang
Hi,

I managed to repeat this issue on CURRENT/14 with this small snip:

---
#!/bin/sh

# test jail name
n="test_ref_leak"

jail -c name=$n path=/ vnet persist
# The following line trigger jail pr_ref leak
jexec $n ifconfig lo0 inet 127.0.0.1/8

jail -R $n

# wait a moment
sleep 1

jls -j $n


---


After DDB debugging and tracing , it seems that is triggered by a combine of 
[1] and [2]

[1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 

[2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b 



In [1] the per-VNET uma zone is shared with the global one.
`pcbinfo->ipi_zone = pcbstor->ips_zone;`

In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by 
uma_zfree_smr() .

Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not 
called immediately ,
thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`.

And it is also not possible to free up the cache by per-VNET SYSUNINIT 
tcp_destroy / udp_destroy / rip_destroy.



Best regards,
Zhenlei

> On Dec 14, 2022, at 9:56 AM, Zhenlei Huang  wrote:
> 
> 
> Hi,
> 
> I also encounter this problem while testing gif tunnel between jails.
> 
> My script is similar but with additional gif tunnels.
> 
> 
> There are reports in mailing list [1], [2], and another one in forum [3] .
> 
> Seem to be a long standing issue.
> 
> [1] 
> https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html 
> 
> [2] https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html 
> 
> [3] 
> https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/
>  
> 
> 
> 
> Best regards,
> Zhenlei
> 
>> On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb > > wrote:
>> 
>> Hi,
>> 
>> I have used scripts like the below for almost a decade and a half
>> (obviously doing more than that in the middle).  I haven't used them
>> much lately but given other questions I just wanted to fire up a test.
>> 
>> I have an end-November kernel doing the below my eapirs do not come back
>> to be destroyed (immediately).
>> I have to start polling for the jid to be no longer alive and not in
>> dying state (hence added the jls/ifconfig -l lines and removed the
>> error checking from ifconfig destroy).  That seems sometimes rather
>> unreasonably long (to the point I give up).
>> 
>> If I don't configure the addresses below this isn't a problem.
>> 
>> Sorry I am confused by too many incarnations of the code; I know I once
>> had a version with an async shutdown path but I believe that never made
>> it into mainline, so why are we holding onto the epairs now and not
>> nuking the addresses and returning them and are clean?
>> 
>> It's a bit more funny; I added a twiddle loop at the end and nothing
>> happened.  So I stop the script and start it again and suddenly another
>> jail or two have cleaned up and their epairs are back.  Something feels
>> very very wonky.  Play around with this and see ... and let me know if
>> you can reproduce this...  I quite wonder why some test cases haven't
>> gone crazy ...
>> 
>> /bz
>> 
>> 
>> #!/bin/sh
>> 
>> set -e
>> set -x
>> 
>> js=`jail -i -c -n jl host.hostname=left.example.net 
>>  vnet persist`
>> jb=`jail -i -c -n jr host.hostname=right.example.net 
>>  vnet persist`
>> 
>> # Create an epair connecting the two machines (vnet jails).
>> ep=`ifconfig epair create | sed -e 's/a$//'`
>> 
>> # Add one end to each vnet jail.
>> ifconfig ${ep}a vnet ${js}
>> ifconfig ${ep}b vnet ${jb}
>> 
>> # Add an IP address on the epairs in each vnet jail.
>> # XXX Leave these out and the cleanup seems to work fine.
>> jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
>> jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24
>> 
>> # Clean up.
>> jail -r ${jb}
>> jail -r ${js}
>> 
>> # You want to be able to remove this line ...
>> set +e
>> 
>> # No epairs to destroy with addresses configured; fine otherwise.
>> ifconfig ${ep}a destroy
>> # echo $?
>> 
>> # Add this is here only as things are funny ...
>> # jls -av jid dying
>> # ifconfig -l
>> 
>> # end
>> 
>> 
>> -- 
>> Bjoern A. Zeeb r15:7
>> 
> 



Re: What's going on with vnets and epairs w/ addresses?

2022-12-14 Thread Kristof Provost



> On 14 Dec 2022, at 20:28, Alexander Leidinger  wrote:
> 
> 
> Quoting "Bjoern A. Zeeb"  (from Tue, 13 Dec 2022 23:03:42 
> + (UTC)):
> 
>> Hi,
>> 
>> I have used scripts like the below for almost a decade and a half
>> (obviously doing more than that in the middle).  I haven't used them
>> much lately but given other questions I just wanted to fire up a test.
>> 
>> I have an end-November kernel doing the below my eapirs do not come back
>> to be destroyed (immediately).
>> I have to start polling for the jid to be no longer alive and not in
>> dying state (hence added the jls/ifconfig -l lines and removed the
>> error checking from ifconfig destroy).  That seems sometimes rather
>> unreasonably long (to the point I give up).
>> 
>> If I don't configure the addresses below this isn't a problem.
>> 
>> Sorry I am confused by too many incarnations of the code; I know I once
>> had a version with an async shutdown path but I believe that never made
>> it into mainline, so why are we holding onto the epairs now and not
>> nuking the addresses and returning them and are clean?
> 
> Kristof, isn't this (epair destruction in jails) one of the issues you looked 
> at? Sorry if I remember incorrectly.
> 
I looked at panics around destroying interfaces and vnets. 

My speculative guess here is that the jail is hanging around for some reason, 
and that’s causing the epair and address to stick around too. 

jls -na might confirm or deny that. 

Br,
Kristof


Re: What's going on with vnets and epairs w/ addresses?

2022-12-13 Thread Alexander Leidinger


Quoting "Bjoern A. Zeeb"  (from Tue, 13 Dec 2022  
23:03:42 + (UTC)):



Hi,

I have used scripts like the below for almost a decade and a half
(obviously doing more than that in the middle).  I haven't used them
much lately but given other questions I just wanted to fire up a test.

I have an end-November kernel doing the below my eapirs do not come back
to be destroyed (immediately).
I have to start polling for the jid to be no longer alive and not in
dying state (hence added the jls/ifconfig -l lines and removed the
error checking from ifconfig destroy).  That seems sometimes rather
unreasonably long (to the point I give up).

If I don't configure the addresses below this isn't a problem.

Sorry I am confused by too many incarnations of the code; I know I once
had a version with an async shutdown path but I believe that never made
it into mainline, so why are we holding onto the epairs now and not
nuking the addresses and returning them and are clean?


Kristof, isn't this (epair destruction in jails) one of the issues you  
looked at? Sorry if I remember incorrectly.


What I have in my jails-shutdown is to do an "ifconfig $epair_in_jail  
-vnet $jail; sleep 2; ifconfig $epair destroy". With this I don't see  
any issues, Everything is cleaned up when the stop finishes.


Bye,
Alexander.


It's a bit more funny; I added a twiddle loop at the end and nothing
happened.  So I stop the script and start it again and suddenly another
jail or two have cleaned up and their epairs are back.  Something feels
very very wonky.  Play around with this and see ... and let me know if
you can reproduce this...  I quite wonder why some test cases haven't
gone crazy ...

/bz


#!/bin/sh

set -e
set -x

js=`jail -i -c -n jl host.hostname=left.example.net vnet persist`
jb=`jail -i -c -n jr host.hostname=right.example.net vnet persist`

# Create an epair connecting the two machines (vnet jails).
ep=`ifconfig epair create | sed -e 's/a$//'`

# Add one end to each vnet jail.
ifconfig ${ep}a vnet ${js}
ifconfig ${ep}b vnet ${jb}

# Add an IP address on the epairs in each vnet jail.
# XXX Leave these out and the cleanup seems to work fine.
jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24

# Clean up.
jail -r ${jb}
jail -r ${js}

# You want to be able to remove this line ...
set +e

# No epairs to destroy with addresses configured; fine otherwise.
ifconfig ${ep}a destroy
# echo $?

# Add this is here only as things are funny ...
# jls -av jid dying
# ifconfig -l

# end


--
Bjoern A. Zeeb r15:7



--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpsDY8yfJFrW.pgp
Description: Digitale PGP-Signatur


Re: What's going on with vnets and epairs w/ addresses?

2022-12-13 Thread Zhenlei Huang

Hi,

I also encounter this problem while testing gif tunnel between jails.

My script is similar but with additional gif tunnels.


There are reports in mailing list [1], [2], and another one in forum [3] .

Seem to be a long standing issue.

[1] https://lists.freebsd.org/pipermail/freebsd-stable/2016-October/086126.html 

[2] https://lists.freebsd.org/pipermail/freebsd-jail/2017-March/003357.html 

[3] 
https://forums.freebsd.org/threads/jails-stopping-prolonged-deaths-starting-networking-et-cetera.84200/
 



Best regards,
Zhenlei

> On Dec 14, 2022, at 7:03 AM, Bjoern A. Zeeb  wrote:
> 
> Hi,
> 
> I have used scripts like the below for almost a decade and a half
> (obviously doing more than that in the middle).  I haven't used them
> much lately but given other questions I just wanted to fire up a test.
> 
> I have an end-November kernel doing the below my eapirs do not come back
> to be destroyed (immediately).
> I have to start polling for the jid to be no longer alive and not in
> dying state (hence added the jls/ifconfig -l lines and removed the
> error checking from ifconfig destroy).  That seems sometimes rather
> unreasonably long (to the point I give up).
> 
> If I don't configure the addresses below this isn't a problem.
> 
> Sorry I am confused by too many incarnations of the code; I know I once
> had a version with an async shutdown path but I believe that never made
> it into mainline, so why are we holding onto the epairs now and not
> nuking the addresses and returning them and are clean?
> 
> It's a bit more funny; I added a twiddle loop at the end and nothing
> happened.  So I stop the script and start it again and suddenly another
> jail or two have cleaned up and their epairs are back.  Something feels
> very very wonky.  Play around with this and see ... and let me know if
> you can reproduce this...  I quite wonder why some test cases haven't
> gone crazy ...
> 
> /bz
> 
> 
> #!/bin/sh
> 
> set -e
> set -x
> 
> js=`jail -i -c -n jl host.hostname=left.example.net vnet persist`
> jb=`jail -i -c -n jr host.hostname=right.example.net vnet persist`
> 
> # Create an epair connecting the two machines (vnet jails).
> ep=`ifconfig epair create | sed -e 's/a$//'`
> 
> # Add one end to each vnet jail.
> ifconfig ${ep}a vnet ${js}
> ifconfig ${ep}b vnet ${jb}
> 
> # Add an IP address on the epairs in each vnet jail.
> # XXX Leave these out and the cleanup seems to work fine.
> jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
> jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24
> 
> # Clean up.
> jail -r ${jb}
> jail -r ${js}
> 
> # You want to be able to remove this line ...
> set +e
> 
> # No epairs to destroy with addresses configured; fine otherwise.
> ifconfig ${ep}a destroy
> # echo $?
> 
> # Add this is here only as things are funny ...
> # jls -av jid dying
> # ifconfig -l
> 
> # end
> 
> 
> -- 
> Bjoern A. Zeeb r15:7
> 



What's going on with vnets and epairs w/ addresses?

2022-12-13 Thread Bjoern A. Zeeb

Hi,

I have used scripts like the below for almost a decade and a half
(obviously doing more than that in the middle).  I haven't used them
much lately but given other questions I just wanted to fire up a test.

I have an end-November kernel doing the below my eapirs do not come back
to be destroyed (immediately).
I have to start polling for the jid to be no longer alive and not in
dying state (hence added the jls/ifconfig -l lines and removed the
error checking from ifconfig destroy).  That seems sometimes rather
unreasonably long (to the point I give up).

If I don't configure the addresses below this isn't a problem.

Sorry I am confused by too many incarnations of the code; I know I once
had a version with an async shutdown path but I believe that never made
it into mainline, so why are we holding onto the epairs now and not
nuking the addresses and returning them and are clean?

It's a bit more funny; I added a twiddle loop at the end and nothing
happened.  So I stop the script and start it again and suddenly another
jail or two have cleaned up and their epairs are back.  Something feels
very very wonky.  Play around with this and see ... and let me know if
you can reproduce this...  I quite wonder why some test cases haven't
gone crazy ...

/bz


#!/bin/sh

set -e
set -x

js=`jail -i -c -n jl host.hostname=left.example.net vnet persist`
jb=`jail -i -c -n jr host.hostname=right.example.net vnet persist`

# Create an epair connecting the two machines (vnet jails).
ep=`ifconfig epair create | sed -e 's/a$//'`

# Add one end to each vnet jail.
ifconfig ${ep}a vnet ${js}
ifconfig ${ep}b vnet ${jb}

# Add an IP address on the epairs in each vnet jail.
# XXX Leave these out and the cleanup seems to work fine.
jexec ${js}  ifconfig ${ep}a inet  192.0.2.1/24
jexec ${jb}  ifconfig ${ep}b inet  192.0.2.2/24

# Clean up.
jail -r ${jb}
jail -r ${js}

# You want to be able to remove this line ...
set +e

# No epairs to destroy with addresses configured; fine otherwise.
ifconfig ${ep}a destroy
# echo $?

# Add this is here only as things are funny ...
# jls -av jid dying
# ifconfig -l

# end


--
Bjoern A. Zeeb r15:7