On Thu, Apr 23, 2015 at 11:55 AM, Jesse Gross <je...@nicira.com> wrote: > On Wed, Apr 22, 2015 at 8:55 PM, Pravin Shelar <pshe...@nicira.com> wrote: >> On Wed, Apr 22, 2015 at 8:50 PM, Jesse Gross <je...@nicira.com> wrote: >>> On Wed, Apr 22, 2015 at 8:37 PM, Pravin Shelar <pshe...@nicira.com> wrote: >>>> On Wed, Apr 22, 2015 at 8:29 PM, Jesse Gross <je...@nicira.com> wrote: >>>>> On Wed, Apr 22, 2015 at 8:22 PM, Pravin Shelar <pshe...@nicira.com> wrote: >>>>>> On Wed, Apr 22, 2015 at 7:48 PM, Jesse Gross <je...@nicira.com> wrote: >>>>>>> On Wed, Apr 22, 2015 at 3:38 PM, Pravin Shelar <pshe...@nicira.com> >>>>>>> wrote: >>>>>>>> On Wed, Apr 22, 2015 at 3:24 PM, Jesse Gross <je...@nicira.com> wrote: >>>>>>>>> On Wed, Apr 22, 2015 at 3:19 PM, Pravin Shelar <pshe...@nicira.com> >>>>>>>>> wrote: >>>>>>>>>> On Wed, Apr 22, 2015 at 11:31 AM, Jesse Gross <je...@nicira.com> >>>>>>>>>> wrote: >>>>>>>>>>> On Wed, Apr 22, 2015 at 11:09 AM, Pravin Shelar >>>>>>>>>>> <pshe...@nicira.com> wrote: >>>>>>>>>>>> On Wed, Apr 22, 2015 at 10:52 AM, Jesse Gross <je...@nicira.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Wed, Apr 22, 2015 at 10:03 AM, Pravin Shelar >>>>>>>>>>>>> <pshe...@nicira.com> wrote: >>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 9:54 PM, Jesse Gross <je...@nicira.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 7:33 PM, Pravin Shelar >>>>>>>>>>>>>>> <pshe...@nicira.com> wrote: >>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 5:56 PM, Jesse Gross >>>>>>>>>>>>>>>> <je...@nicira.com> wrote: >>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:54 PM, Pravin Shelar >>>>>>>>>>>>>>>>> <pshe...@nicira.com> wrote: >>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:36 PM, Jesse Gross >>>>>>>>>>>>>>>>>> <je...@nicira.com> wrote: >>>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 1:29 PM, Pravin B Shelar >>>>>>>>>>>>>>>>>>> <pshe...@nicira.com> wrote: >>>>>>>>>>>>>>>>>>>> diff --git a/datapath/linux/compat/stt.c >>>>>>>>>>>>>>>>>>>> b/datapath/linux/compat/stt.c >>>>>>>>>>>>>>>>>>>> new file mode 100644 >>>>>>>>>>>>>>>>>>>> index 0000000..209bf1a >>>>>>>>>>>>>>>>>>>> --- /dev/null >>>>>>>>>>>>>>>>>>>> +++ b/datapath/linux/compat/stt.c >>>>>>>>>>>>>>>>>>>> +static void update_headers(struct sk_buff *skb, bool head, >>>>>>>>>>>>>>>>>>>> + unsigned int l4_offset, >>>>>>>>>>>>>>>>>>>> unsigned int hdr_len, >>>>>>>>>>>>>>>>>>>> + bool ipv4, u32 tcp_seq) >>>>>>>>>>>>>>>>>>> [...] >>>>>>>>>>>>>>>>>>>> + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)) >>>>>>>>>>>>>>>>>>>> + skb->data_len; >>>>>>>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I wonder if there are any possible edge cases with >>>>>>>>>>>>>>>>>>> resetting truesize >>>>>>>>>>>>>>>>>>> where the packet is still in someone's transmit queue (such >>>>>>>>>>>>>>>>>>> as if we >>>>>>>>>>>>>>>>>>> are looping back packet). Do we need to orphan it first? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ok, I will orphan it in update_headers. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Just to clarify - I was mostly just thinking aloud on >>>>>>>>>>>>>>>>> orphaning it. >>>>>>>>>>>>>>>>> I'm not totally sure if that is the right thing to do or if >>>>>>>>>>>>>>>>> this is >>>>>>>>>>>>>>>>> the right place to do it. I'm not sure what the conceptual >>>>>>>>>>>>>>>>> justification would be for it and it could potentially result >>>>>>>>>>>>>>>>> in the >>>>>>>>>>>>>>>>> sender's buffers not being properly limited. Perhaps not >>>>>>>>>>>>>>>>> resetting the >>>>>>>>>>>>>>>>> truesize is the right thing too... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have seen warning msg if we do no keep truesize update along >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> changes to skb. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hmm, interesting, what is the warning? I don't think that I >>>>>>>>>>>>>>> have seen >>>>>>>>>>>>>>> that before. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Actually skb_try_coalesce() is updating it correctly. so there >>>>>>>>>>>>>> no need >>>>>>>>>>>>>> to change truesize anymore. I will update patch accordingly. >>>>>>>>>>>>> >>>>>>>>>>>>> That's much nicer. I also checked and other receive side code >>>>>>>>>>>>> (like >>>>>>>>>>>>> TCP input) doesn't worry about the case where a local sender may >>>>>>>>>>>>> still >>>>>>>>>>>>> be accounting for the packet since any type of loopback device >>>>>>>>>>>>> does >>>>>>>>>>>>> call skb_orphan() in some form. >>>>>>>>>>>>> >>>>>>>>>>>>> I hate to bring this up but what about on transmit? In cases >>>>>>>>>>>>> where we >>>>>>>>>>>>> merge or split skbs (skb_try_coalesce() and normalize_frag_list() >>>>>>>>>>>>> respectively) we do track the truesize for correctly for the >>>>>>>>>>>>> result >>>>>>>>>>>>> but the individual pieces might not have the right destructors or >>>>>>>>>>>>> might not have their truesize updated for the destructor they do >>>>>>>>>>>>> have. >>>>>>>>>>>> >>>>>>>>>>>> How about about we update merged skb stats (len, data_len, >>>>>>>>>>>> truesize) >>>>>>>>>>>> according to *delta_truesize we get from skb_try_coalesce() and >>>>>>>>>>>> then >>>>>>>>>>>> free the skb? >>>>>>>>>>> >>>>>>>>>>> I think that would work for the skb_try_coalesce() case (although I >>>>>>>>>>> would only worry about truesize, not the lengths). For >>>>>>>>>>> normalize_frag_list() I think we would either have to add a >>>>>>>>>>> destructor >>>>>>>>>>> or not update truesize. I'm not sure that the condition where we >>>>>>>>>>> have >>>>>>>>>>> frag_lists and a destructor actually ever happens in practice >>>>>>>>>>> though. >>>>>>>>>> >>>>>>>>>> I am not sure why do we need a destructor for normalize_frag_list >>>>>>>>>> changes. Even with the changes in the function truesize is consistent >>>>>>>>>> for given skb memory usage. >>>>>>>>> >>>>>>>>> If you have a packet with a frag_list, all of the memory for the >>>>>>>>> individual elements will be accounted for in the truesize in the top >>>>>>>>> level skb. This skb could also be accounted to some socket and have a >>>>>>>>> destructor. When we break apart the list, we remove the truesize from >>>>>>>>> the head because the memory is now part of individual packets. >>>>>>>>> However, the destructor is presumably only on the head and so only its >>>>>>>>> memory will be removed from the socket accounting when it is freed but >>>>>>>>> not each of the other skbs that came from it. >>>>>>>> >>>>>>>> ok. In that case we can not have our own destructor since there is one >>>>>>>> already (I am not sure if we can use skb->cb to restore original). not >>>>>>>> changing true size can complicate skb coalesce, since it does update >>>>>>>> truesize. Easy option would be orphan skb if we are going to coalesce >>>>>>>> its fragments. >>>>>>> >>>>>>> I'm not sure that we need our own destructor. What do you think about >>>>>>> just replicating the original destructor onto each of the newly >>>>>>> generated skbs? >>>>>>> >>>>>> In that case we assume there is no state associated with the skb, that >>>>>> might not be always true. >>>>> >>>>> What state do you mean? If you mean a destructor on the individual >>>>> skbs, I think that is already true because only the top level >>>>> destructor will get called when the original skb is freed. >>>> >>>> If destructor is replicated on the individual skbs then it will be >>>> called for each of those skbs when it is freed. >>> >>> Right; if we've correctly apportioned truesize among the individual >>> skbs isn't that the goal? >> >> It works for destructors which only care for skb-truesize. But >> destructor callback also takes argument, we also need to replicate >> that and we do not have enough information to do it. > > I'm not quite sure what you mean by an argument (I don't see anything > really in either tcp_wfree or sock_wfree) but I agree that this could > be dangerous. There's no guarantee that total truesize on the head > packet accurately accounts for all of the individual frag_list members > (although for the main users like IP fragment reassembly this appears > to be the case). > Here is example of destructor with argument: tpacket_destruct_skb().
> I'm a little nervous about orphaning the packets in this case since it > can potentially affect socket accounting and from a high level, > there's no logical reason to do so (i.e. we're not crossing a boundary > or something like that). > > I guess that in many cases (particularly on the transmit side), moving > the truesize around in normalize_frag_list() and skb_try_coalesce() > will effectively cancel each other out if we can merge things together > (although I guess that depends on the source of the frag_list, IP > fragment reassembly first tries to coalesce and then falls back on > creating frag_lists, in which case us trying to merge things is a > waste of time). It would be nice if we could relegate this to a corner > case and just linearize in those situations. Or better yet, more > deeply understand the conditions where we get the frag_list on > transmit and see if it's really worth trying to coalesce them at all. > grepping drivers/net I don't see a lot of frag_list creation. Even on > receive, I'm not totally sure that breaking up frag_lists makes sense > if skb_segment() can handle it now... Lat time I looked into it it could not handle uneven geometry of skb that STT generate. I will recheck it. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev