Re: [Cake] lockup with multiple cake instances on 3.16.7

Pete Heist Thu, 31 Jan 2019 15:11:11 -0800

> On Jan 31, 2019, at 9:25 PM, Pete Heist <[email protected]> wrote:
> 
> 1) Why is nla_put_u32 suddenly failing for TARGET_US after adding five cake 
> instances?


nla_put_u32 is returning -EMSGSIZE, so the skb space in tailroom isn’t large 
enough (per nla_put doc).

After it fails, cake_dump_stats is called a second time right away, which 
succeeds. I _think_ what’s happening here is that after it sees -EMSGSIZE, the 
kernel allocates more tailroom and calls cake_dump_stats again. This doesn’t 
happen for kernel 4.9.0, it always succeeds, so presumably the initial size is 
larger.

> 2) Is calling sch_tree_unlock the right thing to do in the failure case, or 
> am I working around a kernel bug, and doing something that would fail in 
> other kernels?

I don’t think we’d want to do this after that same edb09eb17 commit. :) So at a 
minimum, to unlock after error:

 nla_put_failure:
        nla_nest_cancel(d->skb, stats);
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 8, 0)
+       sch_tree_unlock(sch);
+#endif
        return -1;
 }

The question still is, would that break other kernel versions if an error 
occurs? If the tree is unlocked again elsewhere in some kernel versions after 
an error, that would end badly. It looks like in tc_fill_qdisc (sch_api.c) that 
gnet_stats_finish_copy, which unlocks, is not called if q->ops->dump_stats 
returns < 0. So I’m not sure how it ever unlocked properly in case of error. 
Hrm.

Do you think this is a correct patch?

_______________________________________________
Cake mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cake

Re: [Cake] lockup with multiple cake instances on 3.16.7

Reply via email to