Re: netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-11-03 Thread billbonaparte
(sorry to send this e-mail again, last mail is rejected by server due to
non-acceptable content)

Florian Westphal [mailto:f...@strlen.de] wrote:
>Correct.  This is broken since the central spin lock removal, since 
>nf_conntrack_lock no longer protects both get_next_corpse and 
>conntrack_confirm.
> 
>Please send a patch, moving dying check after removal of conntrack from 
>the percpu list,
Since unconfirmed conntrack is stored in unconfirmed-list which is per-cpu
list and protected by per-cpu spin-lock, we can remove it from
uncomfirmed-list and insert it into ct-hash-table separately. that is to
say, we can remove it from uncomfirmed-list without holding corresponding
hash-lock, then check if it is dying.
if it is dying, we add it to the dying-list, then quit
__nf_conntrack_confirm. we do this to follow the rules that the conntrack
must alternatively at unconfirmed-list or dying-list when it is abort to be
destroyed.

>>  2. operation on ct->status should be atomic, because it race aginst 
>> get_next_corpse.
>
>Alternatively we could also get rid of the unconfirmed list handling in
get_next_corpse, 
>it looks to me as if its simply not worth the trouble to also caring 
>about
unconfirmed lists.

yes, I think so. 
if there is a race at operating ct->status, there will be in alternative
case:
1) IPS_DYING bit which set in get_next_corpse override other bits (e.g.
IPS_SRC_NAT_DONE_BIT), or
2) other bits (e.g. IPS_SRC_NAT_DONE_BIT) which set in nf_nat_setup_info
override IPS_DYING bit.
but, any case seems to be okay.


fix_conntrack_confirm_race.patch
Description: Binary data


Re: netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-11-03 Thread billbonaparte
(sorry to send this e-mail again, last mail is rejected by server due to
non-acceptable content)

Florian Westphal [mailto:f...@strlen.de] wrote:
Correct.  This is broken since the central spin lock removal, since 
nf_conntrack_lock no longer protects both get_next_corpse and 
conntrack_confirm.
 
Please send a patch, moving dying check after removal of conntrack from 
the percpu list,
Since unconfirmed conntrack is stored in unconfirmed-list which is per-cpu
list and protected by per-cpu spin-lock, we can remove it from
uncomfirmed-list and insert it into ct-hash-table separately. that is to
say, we can remove it from uncomfirmed-list without holding corresponding
hash-lock, then check if it is dying.
if it is dying, we add it to the dying-list, then quit
__nf_conntrack_confirm. we do this to follow the rules that the conntrack
must alternatively at unconfirmed-list or dying-list when it is abort to be
destroyed.

  2. operation on ct-status should be atomic, because it race aginst 
 get_next_corpse.

Alternatively we could also get rid of the unconfirmed list handling in
get_next_corpse, 
it looks to me as if its simply not worth the trouble to also caring 
about
unconfirmed lists.

yes, I think so. 
if there is a race at operating ct-status, there will be in alternative
case:
1) IPS_DYING bit which set in get_next_corpse override other bits (e.g.
IPS_SRC_NAT_DONE_BIT), or
2) other bits (e.g. IPS_SRC_NAT_DONE_BIT) which set in nf_nat_setup_info
override IPS_DYING bit.
but, any case seems to be okay.


fix_conntrack_confirm_race.patch
Description: Binary data


netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-10-27 Thread billbonaparte
Hi, all:
sorry for sending this mail again, the last mail doesn't show text
clearly.
In function __nf_conntrack_confirm, we check the conntrack if it was
alreay dead, before insert it into hash-table. 
we do this because if we insert an already 'dead' hash,  it will
block further use of that particular connection.
but we don't do that right.
let's consider the following case:

cpu1  cpu2
__nf_conntrack_confirm  get_next_corpse
   lock corresponding hash-list
   check nf_ct_is_dying(ct)
for_each_possible_cpu(cpu) {
..
spin_lock_bh(>lock);
..
set_bit(IPS_DYING_BIT, >status);
   nf_ct_del_from_dying_or_unconfirmed_list(ct);
spin_unlock_bh(_lock);
   add_timer(>timeout);  }  
   ct->status |= IPS_CONFIRMD;
   __nf_conntrack_hash_insert(ct);   /* the conntrack has been seted as
dying*/


The above case reveal two problems:
1. we may insert a dead conntrack to hash-table, it will block
further use of that particular connection.
2. operation on ct->status should be atomic, because it race aginst
get_next_corpse.
  due to this reason, the operation on ct->status in
nf_nat_setup_info should be atomic as well.

if we want to resolve the first problem, we must delete the
unconfirmed conntrack from unconfirmed-list first, then check if it is
already dead.
Am I right to do this ?
Appreciate any comments and reply.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-10-27 Thread billbonaparte
Hi, all:
In function __nf_conntrack_confirm, we check the conntrack if it was
alreay dead, before insert it into hash-table. 
we do this because if we insert an already 'dead' hash,  it will
block further use of that particular connection.
but we don't do that right.
let's consider the following case:

cpu1
cpu2
__nf_conntrack_confirm
get_next_corpse
lock corresponding hash-list

check nf_ct_is_dying(ct)
for_each_possible_cpu(cpu) {
..
spin_lock_bh(>lock);
..
set_bit(IPS_DYING_BIT, >status);
nf_ct_del_from_dying_or_unconfirmed_list(ct);
spin_unlock_bh(_lock);
add_timer(>timeout);
}   
ct->status |= IPS_CONFIRMD;
__nf_conntrack_hash_insert(ct);



The above case reveal two problems:
1. we may insert a dead conntrack to hash-table, it will block
further use of that particular connection.
2. operation on ct->status should be atomic, because it race aginst
get_next_corpse.
  due to this reason, the operation on ct->status in
nf_nat_setup_info should be atomic as well.

if we want to resolve the first problem, we must delete the
unconfirmed conntrack from unconfirmed-list first, then check if it is
already dead.
Am I right to do this ?
Appreciate any comments and reply.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-10-27 Thread billbonaparte
Hi, all:
In function __nf_conntrack_confirm, we check the conntrack if it was
alreay dead, before insert it into hash-table. 
we do this because if we insert an already 'dead' hash,  it will
block further use of that particular connection.
but we don't do that right.
let's consider the following case:

cpu1
cpu2
__nf_conntrack_confirm
get_next_corpse
lock corresponding hash-list

check nf_ct_is_dying(ct)
for_each_possible_cpu(cpu) {
..
spin_lock_bh(pcpu-lock);
..
set_bit(IPS_DYING_BIT, ct-status);
nf_ct_del_from_dying_or_unconfirmed_list(ct);
spin_unlock_bh(pcpu_lock);
add_timer(ct-timeout);
}   
ct-status |= IPS_CONFIRMD;
__nf_conntrack_hash_insert(ct);



The above case reveal two problems:
1. we may insert a dead conntrack to hash-table, it will block
further use of that particular connection.
2. operation on ct-status should be atomic, because it race aginst
get_next_corpse.
  due to this reason, the operation on ct-status in
nf_nat_setup_info should be atomic as well.

if we want to resolve the first problem, we must delete the
unconfirmed conntrack from unconfirmed-list first, then check if it is
already dead.
Am I right to do this ?
Appreciate any comments and reply.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse

2014-10-27 Thread billbonaparte
Hi, all:
sorry for sending this mail again, the last mail doesn't show text
clearly.
In function __nf_conntrack_confirm, we check the conntrack if it was
alreay dead, before insert it into hash-table. 
we do this because if we insert an already 'dead' hash,  it will
block further use of that particular connection.
but we don't do that right.
let's consider the following case:

cpu1  cpu2
__nf_conntrack_confirm  get_next_corpse
   lock corresponding hash-list
   check nf_ct_is_dying(ct)
for_each_possible_cpu(cpu) {
..
spin_lock_bh(pcpu-lock);
..
set_bit(IPS_DYING_BIT, ct-status);
   nf_ct_del_from_dying_or_unconfirmed_list(ct);
spin_unlock_bh(pcpu_lock);
   add_timer(ct-timeout);  }  
   ct-status |= IPS_CONFIRMD;
   __nf_conntrack_hash_insert(ct);   /* the conntrack has been seted as
dying*/


The above case reveal two problems:
1. we may insert a dead conntrack to hash-table, it will block
further use of that particular connection.
2. operation on ct-status should be atomic, because it race aginst
get_next_corpse.
  due to this reason, the operation on ct-status in
nf_nat_setup_info should be atomic as well.

if we want to resolve the first problem, we must delete the
unconfirmed conntrack from unconfirmed-list first, then check if it is
already dead.
Am I right to do this ?
Appreciate any comments and reply.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


netfilter: NAT: do the optimization for getting curr_tuple in function nf_nat_setup_info

2014-10-23 Thread billbonaparte
Hi all:
In function nf_nat_setup_info, we need to get the current tuple
which is supposed to send to destination. 
If we haven't done any NAT (SNAT or DNAT) for the tuple, then the
current tuple is equal to original tuple,
otherwise, we should get current tuple by invoking
nf_ct_invert_tuplepr(curr_tuple, >tuplehash[IP_CT_DIR_REPLY].tuple);
like the existing comment says:
/* What we've got will look like inverse of reply. Normally
 * this is what is in the conntrack, except for prior
 * manipulations (future optimization: if num_manips == 0,
 * orig_tp = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
 */
nf_ct_invert_tuplepr(_tuple, 
 >tuplehash[IP_CT_DIR_REPLY].tuple);

So, since it is so, why don't we do the optimization for getting
current tuple ?
   
   As mentioned above, if we have not done DNAT for the tuple, then the
current tuple is equal to original tuple. 
   So I add the optimization as following:

+   if (!(ct->status & IPS_DST_NAT))  /* we do the optimization, as
mentioned above */
+   curr_tuple = >tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+   else 
+   nf_ct_invert_tuplepr(curr_tuple,
 >tuplehash[IP_CT_DIR_REPLY].tuple);

the attachment is the detailed diff. 


do the optimization for getting curr_tuple in function nf_nat_setup_info.diff
Description: Binary data


netfilter: NAT: do the optimization for getting curr_tuple in function nf_nat_setup_info

2014-10-23 Thread billbonaparte
Hi all:
In function nf_nat_setup_info, we need to get the current tuple
which is supposed to send to destination. 
If we haven't done any NAT (SNAT or DNAT) for the tuple, then the
current tuple is equal to original tuple,
otherwise, we should get current tuple by invoking
nf_ct_invert_tuplepr(curr_tuple, ct-tuplehash[IP_CT_DIR_REPLY].tuple);
like the existing comment says:
/* What we've got will look like inverse of reply. Normally
 * this is what is in the conntrack, except for prior
 * manipulations (future optimization: if num_manips == 0,
 * orig_tp = ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple)
 */
nf_ct_invert_tuplepr(curr_tuple, 
 ct-tuplehash[IP_CT_DIR_REPLY].tuple);

So, since it is so, why don't we do the optimization for getting
current tuple ?
   
   As mentioned above, if we have not done DNAT for the tuple, then the
current tuple is equal to original tuple. 
   So I add the optimization as following:

+   if (!(ct-status  IPS_DST_NAT))  /* we do the optimization, as
mentioned above */
+   curr_tuple = ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple;
+   else 
+   nf_ct_invert_tuplepr(curr_tuple,
 ct-tuplehash[IP_CT_DIR_REPLY].tuple);

the attachment is the detailed diff. 


do the optimization for getting curr_tuple in function nf_nat_setup_info.diff
Description: Binary data