On Wednesday 02 February 2011 18:37:18 Linus Lüssing wrote: > From: Sven Eckelmann <[email protected]> > > Was: > --- > <TODO: write a long monologue about every problem we have or could have or > maybe never had and would have when we not have it> > > Signed-off-by: Sven Eckelmann <[email protected]> > --- > > So after some more discussions with Marek and Sven, it looks like we > have to use the rcu protected macros rcu_dereference() and > rcu_assign_pointer() for the bat_priv->curr_gw and curr_gw->orig_node. > > Changes here also include moving the kref_get() from unicast_send_skb() > into gw_get_selected(). The orig_node could have been freed already at > the time the kref_get() was called in unicast_send_skb(). > > Some things that are still not that clear to me: > > gw_election(): > * can the if-block before gw_deselect() be ommited, we had a nullpointer > check for curr_gw just a couple of lines before during the rcu-lock.
I thought that this if block should be moved to gw_select. And your gw_select
still has the bug that the bat_priv->curr_gw isn't set to NULL when
new_gw_node is NULL.
> gw_deselet():
> * is the refcount at this time always 1 for gw_node, can the null
> pointer check + a rcu_dereference be ommited? (at least that's what
> it looks like when comparing to the rcuref.txt example)
Why can't it be NULL? And _always_ use rcu_dereference. What example tells you
that it isn't needed? None of the examples has any kind of rcu pointer in it
(just el as pointer which is stored in a struct were the pointer inside the
struct is rcu protected).
> gw_get_selected():
> * Probably the orig_node's refcounting has to be made atomic, too?
This part is still a little bit ugly and I cannot give you an easy answer.
Just think about following:
* Hash list is a bunch of rcu protected lists
* pointer to originator is stored inside a bucket (list elements inside the
hash)
* hash bucket wants to get removed - call_rcu; reference count of the
originator is decremented immediately
* (!!!! lots of reordering of read and write commands inside the cpu!!!! -
aren't we happy about the added complexity which tries to hide the memory
latency?)
* the originator was removed, the bucket which is removed in the call_rcu
still points to the removed originator
* a parallel running operation tries to find a originator, the rcu list
iterator gets the to-be-deleted bucket to the originator
* the pointer to the already removed originator inside the bucket is
dereferenced, data is read/written -> Kernel Oops
Does this sound scary? At least it could be used in some horror movies (and I
would watch them).
But that is the other problem I currently have with the state of batman-adv in
trunk - and I think I forget to tell you about it after the release of
v2011.0.0.
So, a good idea would be the removal of the buckets for the hash. Usage of
"struct hlist_node" inside the hash elements should be a good starting point.
But think about the problem that the different hashes could have the same
element. So you need for each distinct hash an extra "struct hlist_node"
inside the element which should be part of the hash. The hash_add (and
related) functions don't get the actual pointer to the element, but the
pointer to the correct "struct hlist_node" inside the element/struct. The
comparison and hashing function would also receive "struct hlist_node" as
parameter and must get the pointer to the element using the container_of
macro.
> @@ -171,7 +172,7 @@ struct bat_priv {
> struct delayed_work hna_work;
> struct delayed_work orig_work;
> struct delayed_work vis_work;
> - struct gw_node *curr_gw;
> + struct gw_node *curr_gw; /* rcu protected pointer */
> struct vis_info *my_vis_info;
> };
Sry, but I have to say that: FAIL ;)
I think it should look that way:
> - struct gw_node *curr_gw;
> + struct gw_node __rcu *curr_gw;
Best regards,
Sven
signature.asc
Description: This is a digitally signed message part.
