Hi Dan.

On Thu, Sep 10, 2015 at 05:01:26PM -0400, Dan Streetman wrote:
> Hi Steffen,
> 
> I've been working with Jay on a ipsec issue, which I believe he
> discussed with you.  

Yes, we talked about this at the LPC.

> In this case the xfrm4_garbage_collect is
> returning error because the number of xfrm4 dst entries has exceeded
> twice the gc_thresh, which causes new allocations of xfrm4 dst objects
> to fail, thus making the ipsec connection unusable (until dst objects
> are removed/freed).
> 
> The main reason the count gets to the limit is because the
> xfrm4_policy_afinfo.garbage_collect function - which points to
> flow_cache_flush (indirectly) - doesn't actually guarantee any xfrm4
> dst will get cleaned up, it only cleans up unused entries.
> 
> The flow cache hashtable size limit watermark does restrict how many
> flow cache entries exist (by shrinking the per-cpu hashtable once it
> has 4k entries), and therefore indirectly controls the total number of
> xfrm4 dst objects.  However, there's a mismatch between the default
> xfrm4 gc_thresh - of 32k objects (which sets a 64k max of xfrm4 dst
> objects) - and the flow cache hashtable limit of 4k objects per cpu.
> Any system with 16 or less cpus will have a total limit of 64k (or
> less) flow cache entries, so the 64k xfrm4 dst entry limit will never
> be reached.  However for any system with more than 16 cpus, the flow
> cache limit is greater than the xfrm4 dst limit, and so the xfrm4 dst
> allocation can fail, rendering the ipsec connection unusable.
> 
> The most obvious solution is for the system admin to increase the
> xfrm4_gc_thresh value, although it's not really an obvious solution to
> the end-user what value they should set it to :-) 

Yes, a static gc threshold is always wrong for some workloads. So
the user needs to adjust it to his needs, even if the right value
is not obvious.

> Possibly the
> default value of xfrm4_gc_thresh could be set proportional to
> num_online_cpus(), but that doesn't help when cpus are onlined after
> boot.  

This could be an option, we could change the xfrm4_gc_thresh value with
a cpu notifier callback if more cpus come up after boot.

> Also, a warning message indicating the xfrm4_gc_thresh limit
> was reached, and a suggestion to increase the limit, may help anyone
> who hits the issue.
> 
> I'm not sure if something more aggressive is appropriate, like
> removing active entries during garbage collection. 

It would not make too much sense to push an active flow out of the
fastpath just to add some other flow. If the number of active
entries is to high, there is no other option than increasing the
gc threshold.

You could try to reduce the number of active entries by shutting
down stale security associations frequently.

> Or, removing the
> failure condition from xfrm4_garbage_collect so xfrm4 dst_ops can
> always be allocated,

This would open doors for DOS attacks, we can't do this.

> or just increasing it from gc_thresh * 2 up to *
> 4 or more.

This would just defer the problem, so not a real solution.

That said, whatever we do, we just paper over the real problem,
that is the flowcache itself. Everything that need this kind
of garbage collecting is fundamentally broken. But as long as
nobody volunteers to work on a replacement, we have to live
with this situation somehow.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to