Re: netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
(sorry to send this e-mail again, last mail is rejected by server due to non-acceptable content) Florian Westphal [mailto:f...@strlen.de] wrote: >Correct. This is broken since the central spin lock removal, since >nf_conntrack_lock no longer protects both get_next_corpse and >conntrack_confirm. > >Please send a patch, moving dying check after removal of conntrack from >the percpu list, Since unconfirmed conntrack is stored in unconfirmed-list which is per-cpu list and protected by per-cpu spin-lock, we can remove it from uncomfirmed-list and insert it into ct-hash-table separately. that is to say, we can remove it from uncomfirmed-list without holding corresponding hash-lock, then check if it is dying. if it is dying, we add it to the dying-list, then quit __nf_conntrack_confirm. we do this to follow the rules that the conntrack must alternatively at unconfirmed-list or dying-list when it is abort to be destroyed. >> 2. operation on ct->status should be atomic, because it race aginst >> get_next_corpse. > >Alternatively we could also get rid of the unconfirmed list handling in get_next_corpse, >it looks to me as if its simply not worth the trouble to also caring >about unconfirmed lists. yes, I think so. if there is a race at operating ct->status, there will be in alternative case: 1) IPS_DYING bit which set in get_next_corpse override other bits (e.g. IPS_SRC_NAT_DONE_BIT), or 2) other bits (e.g. IPS_SRC_NAT_DONE_BIT) which set in nf_nat_setup_info override IPS_DYING bit. but, any case seems to be okay. fix_conntrack_confirm_race.patch Description: Binary data
Re: netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
(sorry to send this e-mail again, last mail is rejected by server due to non-acceptable content) Florian Westphal [mailto:f...@strlen.de] wrote: Correct. This is broken since the central spin lock removal, since nf_conntrack_lock no longer protects both get_next_corpse and conntrack_confirm. Please send a patch, moving dying check after removal of conntrack from the percpu list, Since unconfirmed conntrack is stored in unconfirmed-list which is per-cpu list and protected by per-cpu spin-lock, we can remove it from uncomfirmed-list and insert it into ct-hash-table separately. that is to say, we can remove it from uncomfirmed-list without holding corresponding hash-lock, then check if it is dying. if it is dying, we add it to the dying-list, then quit __nf_conntrack_confirm. we do this to follow the rules that the conntrack must alternatively at unconfirmed-list or dying-list when it is abort to be destroyed. 2. operation on ct-status should be atomic, because it race aginst get_next_corpse. Alternatively we could also get rid of the unconfirmed list handling in get_next_corpse, it looks to me as if its simply not worth the trouble to also caring about unconfirmed lists. yes, I think so. if there is a race at operating ct-status, there will be in alternative case: 1) IPS_DYING bit which set in get_next_corpse override other bits (e.g. IPS_SRC_NAT_DONE_BIT), or 2) other bits (e.g. IPS_SRC_NAT_DONE_BIT) which set in nf_nat_setup_info override IPS_DYING bit. but, any case seems to be okay. fix_conntrack_confirm_race.patch Description: Binary data
netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
Hi, all: sorry for sending this mail again, the last mail doesn't show text clearly. In function __nf_conntrack_confirm, we check the conntrack if it was alreay dead, before insert it into hash-table. we do this because if we insert an already 'dead' hash, it will block further use of that particular connection. but we don't do that right. let's consider the following case: cpu1 cpu2 __nf_conntrack_confirm get_next_corpse lock corresponding hash-list check nf_ct_is_dying(ct) for_each_possible_cpu(cpu) { .. spin_lock_bh(>lock); .. set_bit(IPS_DYING_BIT, >status); nf_ct_del_from_dying_or_unconfirmed_list(ct); spin_unlock_bh(_lock); add_timer(>timeout); } ct->status |= IPS_CONFIRMD; __nf_conntrack_hash_insert(ct); /* the conntrack has been seted as dying*/ The above case reveal two problems: 1. we may insert a dead conntrack to hash-table, it will block further use of that particular connection. 2. operation on ct->status should be atomic, because it race aginst get_next_corpse. due to this reason, the operation on ct->status in nf_nat_setup_info should be atomic as well. if we want to resolve the first problem, we must delete the unconfirmed conntrack from unconfirmed-list first, then check if it is already dead. Am I right to do this ? Appreciate any comments and reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
Hi, all: In function __nf_conntrack_confirm, we check the conntrack if it was alreay dead, before insert it into hash-table. we do this because if we insert an already 'dead' hash, it will block further use of that particular connection. but we don't do that right. let's consider the following case: cpu1 cpu2 __nf_conntrack_confirm get_next_corpse lock corresponding hash-list check nf_ct_is_dying(ct) for_each_possible_cpu(cpu) { .. spin_lock_bh(>lock); .. set_bit(IPS_DYING_BIT, >status); nf_ct_del_from_dying_or_unconfirmed_list(ct); spin_unlock_bh(_lock); add_timer(>timeout); } ct->status |= IPS_CONFIRMD; __nf_conntrack_hash_insert(ct); The above case reveal two problems: 1. we may insert a dead conntrack to hash-table, it will block further use of that particular connection. 2. operation on ct->status should be atomic, because it race aginst get_next_corpse. due to this reason, the operation on ct->status in nf_nat_setup_info should be atomic as well. if we want to resolve the first problem, we must delete the unconfirmed conntrack from unconfirmed-list first, then check if it is already dead. Am I right to do this ? Appreciate any comments and reply. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
Hi, all: In function __nf_conntrack_confirm, we check the conntrack if it was alreay dead, before insert it into hash-table. we do this because if we insert an already 'dead' hash, it will block further use of that particular connection. but we don't do that right. let's consider the following case: cpu1 cpu2 __nf_conntrack_confirm get_next_corpse lock corresponding hash-list check nf_ct_is_dying(ct) for_each_possible_cpu(cpu) { .. spin_lock_bh(pcpu-lock); .. set_bit(IPS_DYING_BIT, ct-status); nf_ct_del_from_dying_or_unconfirmed_list(ct); spin_unlock_bh(pcpu_lock); add_timer(ct-timeout); } ct-status |= IPS_CONFIRMD; __nf_conntrack_hash_insert(ct); The above case reveal two problems: 1. we may insert a dead conntrack to hash-table, it will block further use of that particular connection. 2. operation on ct-status should be atomic, because it race aginst get_next_corpse. due to this reason, the operation on ct-status in nf_nat_setup_info should be atomic as well. if we want to resolve the first problem, we must delete the unconfirmed conntrack from unconfirmed-list first, then check if it is already dead. Am I right to do this ? Appreciate any comments and reply. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netfilter: nf_conntrack: there maybe a bug in __nf_conntrack_confirm, when it race against get_next_corpse
Hi, all: sorry for sending this mail again, the last mail doesn't show text clearly. In function __nf_conntrack_confirm, we check the conntrack if it was alreay dead, before insert it into hash-table. we do this because if we insert an already 'dead' hash, it will block further use of that particular connection. but we don't do that right. let's consider the following case: cpu1 cpu2 __nf_conntrack_confirm get_next_corpse lock corresponding hash-list check nf_ct_is_dying(ct) for_each_possible_cpu(cpu) { .. spin_lock_bh(pcpu-lock); .. set_bit(IPS_DYING_BIT, ct-status); nf_ct_del_from_dying_or_unconfirmed_list(ct); spin_unlock_bh(pcpu_lock); add_timer(ct-timeout); } ct-status |= IPS_CONFIRMD; __nf_conntrack_hash_insert(ct); /* the conntrack has been seted as dying*/ The above case reveal two problems: 1. we may insert a dead conntrack to hash-table, it will block further use of that particular connection. 2. operation on ct-status should be atomic, because it race aginst get_next_corpse. due to this reason, the operation on ct-status in nf_nat_setup_info should be atomic as well. if we want to resolve the first problem, we must delete the unconfirmed conntrack from unconfirmed-list first, then check if it is already dead. Am I right to do this ? Appreciate any comments and reply. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
netfilter: NAT: do the optimization for getting curr_tuple in function nf_nat_setup_info
Hi all: In function nf_nat_setup_info, we need to get the current tuple which is supposed to send to destination. If we haven't done any NAT (SNAT or DNAT) for the tuple, then the current tuple is equal to original tuple, otherwise, we should get current tuple by invoking nf_ct_invert_tuplepr(curr_tuple, >tuplehash[IP_CT_DIR_REPLY].tuple); like the existing comment says: /* What we've got will look like inverse of reply. Normally * this is what is in the conntrack, except for prior * manipulations (future optimization: if num_manips == 0, * orig_tp = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple) */ nf_ct_invert_tuplepr(_tuple, >tuplehash[IP_CT_DIR_REPLY].tuple); So, since it is so, why don't we do the optimization for getting current tuple ? As mentioned above, if we have not done DNAT for the tuple, then the current tuple is equal to original tuple. So I add the optimization as following: + if (!(ct->status & IPS_DST_NAT)) /* we do the optimization, as mentioned above */ + curr_tuple = >tuplehash[IP_CT_DIR_ORIGINAL].tuple; + else + nf_ct_invert_tuplepr(curr_tuple, >tuplehash[IP_CT_DIR_REPLY].tuple); the attachment is the detailed diff. do the optimization for getting curr_tuple in function nf_nat_setup_info.diff Description: Binary data
netfilter: NAT: do the optimization for getting curr_tuple in function nf_nat_setup_info
Hi all: In function nf_nat_setup_info, we need to get the current tuple which is supposed to send to destination. If we haven't done any NAT (SNAT or DNAT) for the tuple, then the current tuple is equal to original tuple, otherwise, we should get current tuple by invoking nf_ct_invert_tuplepr(curr_tuple, ct-tuplehash[IP_CT_DIR_REPLY].tuple); like the existing comment says: /* What we've got will look like inverse of reply. Normally * this is what is in the conntrack, except for prior * manipulations (future optimization: if num_manips == 0, * orig_tp = ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple) */ nf_ct_invert_tuplepr(curr_tuple, ct-tuplehash[IP_CT_DIR_REPLY].tuple); So, since it is so, why don't we do the optimization for getting current tuple ? As mentioned above, if we have not done DNAT for the tuple, then the current tuple is equal to original tuple. So I add the optimization as following: + if (!(ct-status IPS_DST_NAT)) /* we do the optimization, as mentioned above */ + curr_tuple = ct-tuplehash[IP_CT_DIR_ORIGINAL].tuple; + else + nf_ct_invert_tuplepr(curr_tuple, ct-tuplehash[IP_CT_DIR_REPLY].tuple); the attachment is the detailed diff. do the optimization for getting curr_tuple in function nf_nat_setup_info.diff Description: Binary data