On Thu, Apr 23, 2015 at 11:55 AM, Jesse Gross <je...@nicira.com> wrote:
> On Wed, Apr 22, 2015 at 8:55 PM, Pravin Shelar <pshe...@nicira.com> wrote:
>> On Wed, Apr 22, 2015 at 8:50 PM, Jesse Gross <je...@nicira.com> wrote:
>>> On Wed, Apr 22, 2015 at 8:37 PM, Pravin Shelar <pshe...@nicira.com> wrote:
>>>> On Wed, Apr 22, 2015 at 8:29 PM, Jesse Gross <je...@nicira.com> wrote:
>>>>> On Wed, Apr 22, 2015 at 8:22 PM, Pravin Shelar <pshe...@nicira.com> wrote:
>>>>>> On Wed, Apr 22, 2015 at 7:48 PM, Jesse Gross <je...@nicira.com> wrote:
>>>>>>> On Wed, Apr 22, 2015 at 3:38 PM, Pravin Shelar <pshe...@nicira.com> 
>>>>>>> wrote:
>>>>>>>> On Wed, Apr 22, 2015 at 3:24 PM, Jesse Gross <je...@nicira.com> wrote:
>>>>>>>>> On Wed, Apr 22, 2015 at 3:19 PM, Pravin Shelar <pshe...@nicira.com> 
>>>>>>>>> wrote:
>>>>>>>>>> On Wed, Apr 22, 2015 at 11:31 AM, Jesse Gross <je...@nicira.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> On Wed, Apr 22, 2015 at 11:09 AM, Pravin Shelar 
>>>>>>>>>>> <pshe...@nicira.com> wrote:
>>>>>>>>>>>> On Wed, Apr 22, 2015 at 10:52 AM, Jesse Gross <je...@nicira.com> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Wed, Apr 22, 2015 at 10:03 AM, Pravin Shelar 
>>>>>>>>>>>>> <pshe...@nicira.com> wrote:
>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 9:54 PM, Jesse Gross <je...@nicira.com> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 7:33 PM, Pravin Shelar 
>>>>>>>>>>>>>>> <pshe...@nicira.com> wrote:
>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 5:56 PM, Jesse Gross 
>>>>>>>>>>>>>>>> <je...@nicira.com> wrote:
>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:54 PM, Pravin Shelar 
>>>>>>>>>>>>>>>>> <pshe...@nicira.com> wrote:
>>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:36 PM, Jesse Gross 
>>>>>>>>>>>>>>>>>> <je...@nicira.com> wrote:
>>>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 1:29 PM, Pravin B Shelar 
>>>>>>>>>>>>>>>>>>> <pshe...@nicira.com> wrote:
>>>>>>>>>>>>>>>>>>>> diff --git a/datapath/linux/compat/stt.c 
>>>>>>>>>>>>>>>>>>>> b/datapath/linux/compat/stt.c
>>>>>>>>>>>>>>>>>>>> new file mode 100644
>>>>>>>>>>>>>>>>>>>> index 0000000..209bf1a
>>>>>>>>>>>>>>>>>>>> --- /dev/null
>>>>>>>>>>>>>>>>>>>> +++ b/datapath/linux/compat/stt.c
>>>>>>>>>>>>>>>>>>>> +static void update_headers(struct sk_buff *skb, bool head,
>>>>>>>>>>>>>>>>>>>> +                              unsigned int l4_offset, 
>>>>>>>>>>>>>>>>>>>> unsigned int hdr_len,
>>>>>>>>>>>>>>>>>>>> +                              bool ipv4, u32 tcp_seq)
>>>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>>>> +       skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)) 
>>>>>>>>>>>>>>>>>>>> + skb->data_len;
>>>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I wonder if there are any possible edge cases with 
>>>>>>>>>>>>>>>>>>> resetting truesize
>>>>>>>>>>>>>>>>>>> where the packet is still in someone's transmit queue (such 
>>>>>>>>>>>>>>>>>>> as if we
>>>>>>>>>>>>>>>>>>> are looping back packet). Do we need to orphan it first?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ok, I will orphan it in update_headers.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just to clarify - I was mostly just thinking aloud on 
>>>>>>>>>>>>>>>>> orphaning it.
>>>>>>>>>>>>>>>>> I'm not totally sure if that is the right thing to do or if 
>>>>>>>>>>>>>>>>> this is
>>>>>>>>>>>>>>>>> the right place to do it. I'm not sure what the conceptual
>>>>>>>>>>>>>>>>> justification would be for it and it could potentially result 
>>>>>>>>>>>>>>>>> in the
>>>>>>>>>>>>>>>>> sender's buffers not being properly limited. Perhaps not 
>>>>>>>>>>>>>>>>> resetting the
>>>>>>>>>>>>>>>>> truesize is the right thing too...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have seen warning msg if we do no keep truesize update along 
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> changes to skb.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hmm, interesting, what is the warning? I don't think that I 
>>>>>>>>>>>>>>> have seen
>>>>>>>>>>>>>>> that before.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually skb_try_coalesce() is updating it correctly. so there 
>>>>>>>>>>>>>> no need
>>>>>>>>>>>>>> to change truesize anymore. I will update patch accordingly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's much nicer. I also checked and other receive side code 
>>>>>>>>>>>>> (like
>>>>>>>>>>>>> TCP input) doesn't worry about the case where a local sender may 
>>>>>>>>>>>>> still
>>>>>>>>>>>>> be accounting for the packet since any type of loopback device 
>>>>>>>>>>>>> does
>>>>>>>>>>>>> call skb_orphan() in some form.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I hate to bring this up but what about on transmit? In cases 
>>>>>>>>>>>>> where we
>>>>>>>>>>>>> merge or split skbs (skb_try_coalesce() and normalize_frag_list()
>>>>>>>>>>>>> respectively) we do track the truesize for correctly for the 
>>>>>>>>>>>>> result
>>>>>>>>>>>>> but the individual pieces might not have the right destructors or
>>>>>>>>>>>>> might not have their truesize updated for the destructor they do 
>>>>>>>>>>>>> have.
>>>>>>>>>>>>
>>>>>>>>>>>> How about about we update merged skb stats (len, data_len, 
>>>>>>>>>>>> truesize)
>>>>>>>>>>>> according to *delta_truesize we get from skb_try_coalesce() and 
>>>>>>>>>>>> then
>>>>>>>>>>>> free the skb?
>>>>>>>>>>>
>>>>>>>>>>> I think that would work for the skb_try_coalesce() case (although I
>>>>>>>>>>> would only worry about truesize, not the lengths). For
>>>>>>>>>>> normalize_frag_list() I think we would either have to add a 
>>>>>>>>>>> destructor
>>>>>>>>>>> or not update truesize. I'm not sure that the condition where we 
>>>>>>>>>>> have
>>>>>>>>>>> frag_lists and a destructor actually ever happens in practice 
>>>>>>>>>>> though.
>>>>>>>>>>
>>>>>>>>>> I am not sure why do we need a destructor for normalize_frag_list
>>>>>>>>>> changes. Even with the changes in the function truesize is consistent
>>>>>>>>>> for given skb memory usage.
>>>>>>>>>
>>>>>>>>> If you have a packet with a frag_list, all of the memory for the
>>>>>>>>> individual elements will be accounted for in the truesize in the top
>>>>>>>>> level skb. This skb could also be accounted to some socket and have a
>>>>>>>>> destructor. When we break apart the list, we remove the truesize from
>>>>>>>>> the head because the memory is now part of individual packets.
>>>>>>>>> However, the destructor is presumably only on the head and so only its
>>>>>>>>> memory will be removed from the socket accounting when it is freed but
>>>>>>>>> not each of the other skbs that came from it.
>>>>>>>>
>>>>>>>> ok. In that case we can not have our own destructor since there is one
>>>>>>>> already (I am not sure if we can use skb->cb to restore original). not
>>>>>>>> changing true size can complicate skb coalesce, since it does update
>>>>>>>> truesize. Easy option would be orphan skb if we are going to coalesce
>>>>>>>> its fragments.
>>>>>>>
>>>>>>> I'm not sure that we need our own destructor. What do you think about
>>>>>>> just replicating the original destructor onto each of the newly
>>>>>>> generated skbs?
>>>>>>>
>>>>>> In that case we assume there is no state associated with the skb, that
>>>>>> might not be always true.
>>>>>
>>>>> What state do you mean? If you mean a destructor on the individual
>>>>> skbs, I think that is already true because only the top level
>>>>> destructor will get called when the original skb is freed.
>>>>
>>>> If destructor is replicated on the individual skbs then it will be
>>>> called for each of those skbs when it is freed.
>>>
>>> Right; if we've correctly apportioned truesize among the individual
>>> skbs isn't that the goal?
>>
>> It works for destructors which only care for skb-truesize. But
>> destructor callback also takes argument, we also need to replicate
>> that and we do not have enough information to do it.
>
> I'm not quite sure what you mean by an argument (I don't see anything
> really in either tcp_wfree or sock_wfree) but I agree that this could
> be dangerous. There's no guarantee that total truesize on the head
> packet accurately accounts for all of the individual frag_list members
> (although for the main users like IP fragment reassembly this appears
> to be the case).
>
Here is example of destructor with argument: tpacket_destruct_skb().

> I'm a little nervous about orphaning the packets in this case since it
> can potentially affect socket accounting and from a high level,
> there's no logical reason to do so (i.e. we're not crossing a boundary
> or something like that).
>
> I guess that in many cases (particularly on the transmit side), moving
> the truesize around in normalize_frag_list() and skb_try_coalesce()
> will effectively cancel each other out if we can merge things together
> (although I guess that depends on the source of the frag_list, IP
> fragment reassembly first tries to coalesce and then falls back on
> creating frag_lists, in which case us trying to merge things is a
> waste of time). It would be nice if we could relegate this to a corner
> case and just linearize in those situations. Or better yet, more
> deeply understand the conditions where we get the frag_list on
> transmit and see if it's really worth trying to coalesce them at all.
> grepping drivers/net I don't see a lot of frag_list creation. Even on
> receive, I'm not totally sure that breaking up frag_lists makes sense
> if skb_segment() can handle it now...

Lat time I looked into it it could not handle uneven geometry of skb
that STT generate. I will recheck it.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to