Got it, thanks a lot.
At 2020-10-14 21:55:25, "Olivier Matz" <olivier.m...@6wind.com> wrote: >Hi, > >On Sat, Oct 10, 2020 at 09:49:35AM +0800, yang_y_yi wrote: >> Olivier, thank you so much for helping figure out this, it does work >> as the code you changed, so we can fix the issue without this patch >> series. My question is can it also work for normal mbuf or indirect >> mbuf? (I mean pkt is direct mbuf or indirect mbuf) > >Yes, it works for any mbuf type for pkt (direct, indirect, ext). > >Happy to see it solves your issue! > >Regards, >Olivier > > > >> At 2020-10-09 19:55:25, "Olivier Matz" <olivier.m...@6wind.com> wrote: >> >Hi, >> > >> >On Fri, Oct 09, 2020 at 05:51:23PM +0800, yang_y_yi wrote: >> >> Olivier, thank you so much for your reply, your patch post for vhost help >> >> me understand your concern better, I totally agree. For GSO case, let me >> >> show you a simple code to explain my issue. >> >> >> >> >> >> >> >> >> >> >> >> struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); >> >> virtio_dev_extbuf_alloc(pkt, data_len) >> >> struct rte_mbuf * pkt_seg1 = rte_pktmbuf_alloc(indirect_pool); >> >> >> >> >> >> rte_pktmbuf_attach(pkt_seg1, pkt); >> >> rte_mbuf_ext_refcnt_update(pkt, -1); >> >> >> >> struct rte_mbuf * pkt_seg2 = rte_pktmbuf_alloc(indirect_pool); >> >> rte_pktmbuf_attach(pkt_seg2, pkt); >> >> rte_mbuf_ext_refcnt_update(pkt, -1); >> >> struct rte_mbuf *pkt_segs[2] = {pkt_seg1, pkt_seg2}; >> >> >> >> rte_eth_tx_burst(dev->port_id, qid, pkt_segs, 2); >> >> >> >> >> >> Is it a simple test you expect? The issue here is nobody explicitly calls >> >> rte_pktmbuf_free(pkt), rte_pktmbuf_free(pkt_segX) in PMD driver won't free >> >> "pkt", Is it clear to you now? >> > >> > >> >Thank you for the small code. Yes, this is what I expected. >> > >> >The proper way to do this is something like this: >> > >> > /* create a mbuf, and attach it to an external buffer */ >> > struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); >> > virtio_dev_extbuf_alloc(pkt, data_len) >> > >> > /* create a new mbuf, attach it to the previous one: the resulting >> > * mbuf is also an "external mbuf" (is has the EXT_ATTACHED_MBUF >> > * flag, and its data is stored in the ext buffer. >> > * See an example here: https://www.droids-corp.org/~zer0/ext-mbuf.svg >> > */ >> > struct rte_mbuf *pkt_seg1 = rte_pktmbuf_alloc(indirect_pool); >> > rte_pktmbuf_attach(pkt_seg1, pkt); >> > >> > /* do the same another time */ >> > struct rte_mbuf *pkt_seg2 = rte_pktmbuf_alloc(indirect_pool); >> > rte_pktmbuf_attach(pkt_seg2, pkt); >> > >> > /* release the original pkt, we don't need it anymore */ >> > rte_pktmbuf_free(pkt); >> > >> > /* send the new segments, they will be freed by the driver once >> > * Tx is done. When the last packet referencing the external buffer >> > * is freed, the free callback of the buffer will be invoked. */ >> > struct rte_mbuf *pkt_segs[2] = {pkt_seg1, pkt_seg2}; >> > rte_eth_tx_burst(dev->port_id, qid, pkt_segs, 2); >> > >> >Hope this is clearer now. >> > >> >Regards, >> >Olivier >> > >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2020-10-07 17:48:21, "Olivier Matz" <olivier.m...@6wind.com> wrote: >> >> >Hi, >> >> > >> >> >On Sun, Sep 27, 2020 at 01:55:21PM +0800, yang_y_yi wrote: >> >> >> Per GSO requirement, this is a must-have change, Jiayu, can you help >> >> >> review >> >> >> this series? >> >> > >> >> >I can't ack this patch until I have a simple and clear test case (only >> >> >with mbuf functions, without GSO or vhost) showing the issue we have >> >> >today with current. >> >> > >> >> >> Olivier, if you used the old interface, maybe you need to change your >> >> >> code to >> >> >> adapt this, I don't think we can have a better way to handle GSO case. >> >> > >> >> >Sorry, I don't get your point. What do I need to change in which code? >> >> > >> >> >(some more comments below) >> >> > >> >> >> At 2020-08-04 09:31:19, "yang_y_yi" <yang_y...@163.com> wrote: >> >> >> >> >> >> At 2020-08-03 20:34:25, "Olivier Matz" <olivier.m...@6wind.com> wrote: >> >> >> >On Mon, Aug 03, 2020 at 05:42:13PM +0800, yang_y_yi wrote: >> >> >> >> At 2020-08-03 16:11:39, "Olivier Matz" <olivier.m...@6wind.com> >> >> >> >> wrote: >> >> >> >> >On Mon, Aug 03, 2020 at 09:26:40AM +0800, yang_y_yi wrote: >> >> >> >> >> At 2020-08-03 04:29:07, "Olivier Matz" <olivier.m...@6wind.com> >> >> >> >> >> wrote: >> >> >> >> >> >Hi, >> >> >> >> >> > >> >> >> >> >> >On Sun, Aug 02, 2020 at 07:12:36AM +0800, yang_y_yi wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> At 2020-07-31 23:15:43, "Olivier Matz" >> >> >> >> >> >> <olivier.m...@6wind.com> wrote: >> >> >> >> >> >> >Hi, >> >> >> >> >> >> > >> >> >> >> >> >> >On Thu, Jul 30, 2020 at 08:08:59PM +0800, yang_y...@163.com >> >> >> >> >> >> >wrote: >> >> >> >> >> >> >> From: Yi Yang <yangy...@inspur.com> >> >> >> >> >> >> >> >> >> >> >> >> >> >> In GSO case, segmented mbufs are attached to original >> >> >> >> >> >> >> mbuf which can't be freed when it is external. The issue >> >> >> >> >> >> >> is free_cb doesn't know original mbuf and doesn't free >> >> >> >> >> >> >> it when refcnt of shinfo is 0. >> >> >> >> >> >> >> >> >> >> >> >> >> >> Original mbuf can be freed by rte_pktmbuf_free segmented >> >> >> >> >> >> >> mbufs or by rte_pktmbuf_free original mbuf. Two kind of >> >> >> >> >> >> >> cases should have different behaviors. free_cb won't >> >> >> >> >> >> >> explicitly call rte_pktmbuf_free to free original mbuf >> >> >> >> >> >> >> if it is freed by rte_pktmbuf_free original mbuf, but it >> >> >> >> >> >> >> has to call rte_pktmbuf_free to free original mbuf if it >> >> >> >> >> >> >> is freed by rte_pktmbuf_free segmented mbufs. >> >> >> >> >> >> >> >> >> >> >> >> >> >> In order to fix this issue, free_cb interface has to been >> >> >> >> >> >> >> changed, __rte_pktmbuf_free_extbuf must deliver called >> >> >> >> >> >> >> mbuf pointer to free_cb, argument opaque can be defined >> >> >> >> >> >> >> as a custom struct by user, it can includes original mbuf >> >> >> >> >> >> >> pointer, user-defined free_cb can compare caller mbuf with >> >> >> >> >> >> >> mbuf in opaque struct, free_cb should free original mbuf >> >> >> >> >> >> >> if they are not same, this corresponds to rte_pktmbuf_free >> >> >> >> >> >> >> segmented mbufs case, otherwise, free_cb won't free >> >> >> >> >> >> >> original >> >> >> >> >> >> >> mbuf because the caller explicitly called rte_pktmbuf_free >> >> >> >> >> >> >> to free it. >> >> >> >> >> >> >> >> >> >> >> >> >> >> Here is pseduo code to show two kind of cases. >> >> >> >> >> >> >> >> >> >> >> >> >> >> case 1. rte_pktmbuf_free segmented mbufs >> >> >> >> >> >> >> >> >> >> >> >> >> >> nb_tx = rte_gso_segment(original_mbuf, /* original mbuf */ >> >> >> >> >> >> >> &gso_ctx, >> >> >> >> >> >> >> /* segmented mbuf */ >> >> >> >> >> >> >> (struct rte_mbuf **)&gso_mbufs, >> >> >> >> >> >> >> MAX_GSO_MBUFS); >> >> >> >> >> >> > >> >> >> >> >> >> >I'm sorry but it is not very clear to me what operations are >> >> >> >> >> >> >done by >> >> >> >> >> >> >rte_gso_segment(). >> >> >> >> >> >> > >> >> >> >> >> >> >In the current code, I only see calls to >> >> >> >> >> >> >rte_pktmbuf_attach(), >> >> >> >> >> >> >which do not deal with external buffers. Am I missing >> >> >> >> >> >> >something? >> >> >> >> >> >> > >> >> >> >> >> >> >Are you able to show the issue only with mbuf functions? It >> >> >> >> >> >> >would >> >> >> >> >> >> >be helpful to understand what does not work. >> >> >> >> >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> >Thanks, >> >> >> >> >> >> >Olivier >> >> >> >> >> >> > >> >> >> >> >> >> Oliver, thank you for comment, let me show you why it doesn't >> >> >> >> >> >> work for my use case. In OVS DPDK, VM uses vhostuserclient >> >> >> >> >> >> to send large packets whose size is about 64K because we >> >> >> >> >> >> enabled TSO & UFO, these large packets use rte_mbufs >> >> >> >> >> >> allocated by DPDK virtio_net function >> >> >> >> >> >> virtio_dev_pktmbuf_alloc() (in file >> >> >> >> >> >> lib/librte_vhost/virtio_net.c. Please refer to [PATCH V1 >> >> >> >> >> >> 3/3], I changed free_cb as below, these packets use the same >> >> >> >> >> >> allocate function and the same free_cb no matter they are TCP >> >> >> >> >> >> packet or UDP packets, in case of VXLAN TSO, most NICs can't >> >> >> >> >> >> support inner UDP fragment offload, so OVS DPDK has to do it >> >> >> >> >> >> by software, for UDP case, the original rte_mbuf only can be >> >> >> >> >> >> freed by segmented rte_mbufs which are output packets of >> >> >> >> >> >> rte_gso_segment, i.e. the original rte_mbuf only can freed by >> >> >> >> >> >> free_cb, you can see, it explicitly called >> >> >> >> >> >> rte_pktmbuf_free(arg->mbuf), the condition statement "if >> >> >> >> >> >> (caller_m != arg->mbuf)" is true for this case, this has no >> >> >> >> >> >> problem, but for TCP case, the original mbuf is delivered to >> >> >> >> >> >> rte_eth_tx_burst() but not segmented rte_mbufs output by >> >> >> >> >> >> rte_gso_segment, PMD driver will call >> >> >> >> >> >> rte_pktmbuf_free(original_rte_mbuf) but not >> >> >> >> >> >> rte_pktmbuf_free(segmented_rte_mbufs), the same free_cb will >> >> >> >> >> >> be called, that means original_rte_mbuf will be freed twice, >> >> >> >> >> >> you know what will happen, this is just the issue I'm fixing. >> >> >> >> >> >> I bring in caller_m argument, it can help work around this >> >> >> >> >> >> because caller_m is arg->mbuf and the condition statement "if >> >> >> >> >> >> (caller_m != arg->mbuf)" is false, you can't fix it without >> >> >> >> >> >> the change this patch series did. >> >> >> >> >> > >> >> >> >> >> >I'm sill not sure to get your issue. Please, if you have a >> >> >> >> >> >simple test >> >> >> >> >> >case using only mbufs functions (without virtio, gso, ...), it >> >> >> >> >> >would be >> >> >> >> >> >very helpful because we will be sure that we are talking about >> >> >> >> >> >the same >> >> >> >> >> >thing. In case there is an issue, it can easily become a unit >> >> >> >> >> >test. >> >> >> >> >> >> >> >> >> >> Oliver, I think you don't get the point, free operation can't be >> >> >> >> >> controlled by the application itself, >> >> >> >> >> it is done by PMD driver and triggered by rte_eth_tx_burst, I >> >> >> >> >> have shown pseudo code, >> >> >> >> >> rte_gso_segment just segments a large mbuf to multiple mbufs, it >> >> >> >> >> won't send them, the application >> >> >> >> >> will call rte_eth_tx_burst to send them finally. >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >That said, I looked at vhost mbuf allocation and gso >> >> >> >> >> >segmentation, and >> >> >> >> >> >I found some strange things: >> >> >> >> >> > >> >> >> >> >> >1/ In virtio_dev_extbuf_alloc(), and I there are 2 paths to >> >> >> >> >> >create the >> >> >> >> >> > ext mbuf. >> >> >> >> >> > >> >> >> >> >> > a/ The first one stores the shinfo struct in the mbuf, >> >> >> >> >> > basically >> >> >> >> >> > like this: >> >> >> >> >> > >> >> >> >> >> > pkt = rte_pktmbuf_alloc(mp); >> >> >> >> >> > shinfo = rte_pktmbuf_mtod(pkt, struct >> >> >> >> >> > rte_mbuf_ext_shared_info *); >> >> >> >> >> > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); >> >> >> >> >> > shinfo->free_cb = virtio_dev_extbuf_free; >> >> >> >> >> > shinfo->fcb_opaque = buf; >> >> >> >> >> > rte_mbuf_ext_refcnt_set(shinfo, 1); >> >> >> >> >> > >> >> >> >> >> > I don't think it is a good idea, because there is no >> >> >> >> >> > guarantee that >> >> >> >> >> > the mbuf won't be freed before the buffer. For instance, >> >> >> >> >> > doing >> >> >> >> >> > this will probably fail: >> >> >> >> >> > >> >> >> >> >> > pkt2 = rte_pktmbuf_alloc(mp); >> >> >> >> >> > rte_pktmbuf_attach(pkt2, pkt); >> >> >> >> >> > rte_pktmbuf_free(pkt); /* pkt is freed, but it >> >> >> >> >> > contains shinfo ! */ >> >> >> >> >> >> >> >> >> >> pkt is created by the application I can control, so I can >> >> >> >> >> control it where it will be freed, right? >> >> >> >> > >> >> >> >> >This example shows that mbufs allocated like this by the vhost >> >> >> >> >driver are not constructed correctly. If an application attach a >> >> >> >> >new >> >> >> >> >packet (pkt2) to it and frees the original one (pkt), it may >> >> >> >> >result in a >> >> >> >> >memory corruption. >> >> >> >> > >> >> >> >> >Of course, to be tested and confirmed. >> >> >> >> >> >> >> >> No, attach will increase refcnt of shinfo, free_cb only is called >> >> >> >> when refcnt of shinfo is decreased to >> >> >> >> 0, isn't it? >> >> >> > >> >> >> >When pkt will be freed, it will decrement the shinfo refcnt, and >> >> >> >after it will be 1. So the buffer won't be freed. After that, the >> >> >> >mbuf pkt will be detached, and will return to the mbuf pool. It means >> >> >> >it can be reallocated, and the next user can overwrite shinfo which >> >> >> >is still stored in the mbuf data. >> >> >> >> >> >> I think this is an issue of DPDK itself, if external buffer in shinfo >> >> >> is freed, shinfo should be set to NULL, if user will >> >> >> overwrite it, he/she should use the same way as a new external buffer >> >> >> is attached. >> >> > >> >> >No, there is no issue in DPDK. The lifetime of shinfo should be at least >> >> >the same the lifetime of the buffer. If shinfo is stored in initial mbuf >> >> >(called "pkt" in the example above), the mbuf and shinfo can be freed >> >> >while >> >> >the buffer is still in use by another packet. >> >> > >> >> >> >I did a test to show it, see: >> >> >> >http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=a617494eeb01ff >> >> >> > >> >> >> >If you run the mbuf autotest, it segfaults. >> >> >> >> >> >> I think your test is wrong, you're changing shinfo (which is being >> >> >> used) in wrong way, if free_bc is NULL, it will be invalid. >> >> > >> >> >I'm changing the data of a newly allocated mbuf, it is perfectly legal. >> >> >I happens that it points the the shinfo that is still in use. >> >> > >> >> > >> >> >> >> >> >> static inline void >> >> >> rte_pktmbuf_attach_extbuf(struct rte_mbuf *m, void *buf_addr, >> >> >> rte_iova_t buf_iova, uint16_t buf_len, >> >> >> struct rte_mbuf_ext_shared_info *shinfo) >> >> >> { >> >> >> /* mbuf should not be read-only */ >> >> >> RTE_ASSERT(RTE_MBUF_DIRECT(m) && rte_mbuf_refcnt_read(m) == 1); >> >> >> RTE_ASSERT(shinfo->free_cb != NULL); >> >> >> >> >> >> For any shinfo operation, you should do it by >> >> >> rte_pktmbuf_attach_extbuf, you can't change it at will after that. >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> > >> >> >> >> >> > To do this properly, the mbuf refcnt should be increased, >> >> >> >> >> > and >> >> >> >> >> > the mbuf should be freed in the callback. But I don't >> >> >> >> >> > think it's >> >> >> >> >> > worth doing it, given the second path (b/) looks good to >> >> >> >> >> > me. >> >> >> >> >> > >> >> >> >> >> > b/ The second path stores the shinfo struct at the end of the >> >> >> >> >> > allocated buffer, like this: >> >> >> >> >> > >> >> >> >> >> > pkt = rte_pktmbuf_alloc(mp); >> >> >> >> >> > buf_len += sizeof(*shinfo) + sizeof(uintptr_t); >> >> >> >> >> > buf_len = RTE_ALIGN_CEIL(total_len, sizeof(uintptr_t)); >> >> >> >> >> > buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE); >> >> >> >> >> > shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, >> >> >> >> >> > &buf_len, >> >> >> >> >> > >> >> >> >> >> > virtio_dev_extbuf_free, buf); >> >> >> >> >> > >> >> >> >> >> > I think this is correct, because we have the guarantee >> >> >> >> >> > that shinfo >> >> >> >> >> > exists as long as the buffer exists. >> >> >> >> >> >> >> >> >> >> What buffer does the allocated buffer you're saying here? The >> >> >> >> >> issue we're discussing how we can >> >> >> >> >> free original mbuf which owns shinfo buffer. >> >> >> >> > >> >> >> >> >I don't get your question. >> >> >> >> > >> >> >> >> >I'm just saying that this code path looks correct, compared to >> >> >> >> >the previous one. >> >> >> >> >> >> >> >> I think you're challenging principle of external mbuf, that isn't >> >> >> >> the thing I address. >> >> >> > >> >> >> >I'm not challenging anything, I'm saying there is a bug in this code, >> >> >> >and the unit test above tends to confirm it. >> >> >> >> >> >> If it is bug, you can post a patch to fix it, it isn't related with >> >> >> my patches. But in my opinion, you don't >> >> >> use it in correct way, I don't think it is a bug. >> >> > >> >> >I'll submit a patch for this. >> >> > >> >> >The point is you are testing GSO on top of wrongly-constructed mbufs, so >> >> >it would be safer for you to fix this before doing more tests. >> >> > >> >> > >> >> >> >> >> >2/ in rte_gso_segment(), there is a loop like this: >> >> >> >> >> > >> >> >> >> >> > while (pkt_seg) { >> >> >> >> >> > rte_mbuf_refcnt_update(pkt_seg, -1); >> >> >> >> >> > pkt_seg = pkt_seg->next; >> >> >> >> >> > } >> >> >> >> >> > >> >> >> >> >> > You change it to take in account the refcnt for ext mbufs. >> >> >> >> >> > >> >> >> >> >> > I may have missed something but I wonder why it is not >> >> >> >> >> > simply: >> >> >> >> >> > >> >> >> >> >> > rte_pktmbuf_free(pkt_seg); >> >> >> >> >> > >> >> >> >> >> > It will decrease the proper refcnt, and free the mbufs if >> >> >> >> >> > they >> >> >> >> >> > are not used. >> >> >> >> >> >> >> >> >> >> Again, rte_gso_segment just decreases refcnt by one, this will >> >> >> >> >> ensure the last segmented >> >> >> >> >> mbuf free will trigger freeing original mbuf (only free_cb can >> >> >> >> >> do this). >> >> >> >> > >> >> >> >> >rte_pktmbuf_free() will also decerase the refcnt, and free the >> >> >> >> >resources >> >> >> >> >when the refcnt reaches 0. >> >> >> >> > >> >> >> >> >It has some advantages compared to decrease the reference counter >> >> >> >> >of all >> >> >> >> >segments: >> >> >> >> > >> >> >> >> >- no need to iterate the segments, there is only one function call >> >> >> >> >- no need to have a special case for ext mbufs like you did in >> >> >> >> >your patch >> >> >> >> >- it may be safer, in case some segments have a refcnt == 1, >> >> >> >> >because >> >> >> >> > resources will be freed. >> >> >> >> >> >> >> >> For external mbuf, attach only increases refcnt of shinfo, refcnt >> >> >> >> of mbuf won't be touched. For normal >> >> >> >> mbuf, attach only increase refcnt of mbuf, no shinfo there, no >> >> >> >> refcnt of shinfo increased. >> >> >> > >> >> >> >I suppose rte_gso_segment() can take any mbuf type as input: standard >> >> >> >mbuf, indirect mbuf, ext mbuf, or even a mbuf chaing containing >> >> >> >segments of >> >> >> >different types. >> >> >> > >> >> >> >For instance, if you pass a chain of 2 mbufs: >> >> >> >- the first one is a direct mbuf containing the IP/TCP headers >> >> >> >(orig_hdr) >> >> >> >- the second on is a mbuf pointing to an ext buffer (orig_payload) >> >> >> > >> >> >> >I expect that the resulting mbuf after calling gso contains a list of >> >> >> >mbufs >> >> >> >like this: >> >> >> >- a first segment containing the IP/TCP headers (new_hdr) >> >> >> >- a payload segment pointing on the same ext buffer >> >> >> > >> >> >> >In theory, there is no reason that orig_hdr should be referenced by >> >> >> >another new mbuf, because it only contains headers (no data). If >> >> >> >that's >> >> >> >the case, its refcnt is 1, and decreasing it to 0 without freeing it >> >> >> >is a bug. >> >> >> >> >> >> For this user scenario, orig_m is owner of external buffer, small >> >> >> segmented mbufs reference >> >> >> it, you shouldn't free orig_m before all attached segmented mbufs are >> >> >> freed, isn't it? >> >> > >> >> >In this case, orig_hdr has to be freed because it is a direct mbuf (not >> >> >shared). >> >> >The buffer pointed by orig_payload will be freed when all newly created >> >> >segments are freed. >> >> > >> >> > >> >> >> > >> >> >> >Anyway, there is maybe no issue in that case, but I was just >> >> >> >suggesting >> >> >> >that using rte_pktmbuf_free() is easier to read, and safer than >> >> >> >manually >> >> >> >decreasing the refcnt of each segment. >> >> >> > >> >> >> > >> >> >> >> >> >Again, sorry if this is not the issue your are referring to, but >> >> >> >> >> >in this case I really think that having a simple example code >> >> >> >> >> >that >> >> >> >> >> >shows the issue would help. >> >> >> >> >> >> >> >> >> >> Oliver, my statement in the patch I sent out has pseudo code to >> >> >> >> >> show this. I don't think a simple >> >> >> >> >> unit test can show it. >> >> >> >> > >> >> >> >> >I don't see why. The PMDs and the libraries use the mbuf >> >> >> >> >functions, why >> >> >> >> >a unit test couldn't call the same functions? >> >> >> >> > >> >> >> >> >> Let me summarize it here again. For original mbuf, there are two >> >> >> >> >> cases freeing >> >> >> >> >> it, case one is PMD driver calls free against segmented mbufs, >> >> >> >> >> last segmented mbuf free will trigger >> >> >> >> >> free_cb call which will free original large & extended mbuf. >> >> >> >> > >> >> >> >> >OK >> >> >> >> > >> >> >> >> >> Case two is PMD driver will call free against >> >> >> >> >> original mbuf, that also will call free_cb to free attached >> >> >> >> >> extended buffer. >> >> >> >> > >> >> >> >> >OK >> >> >> >> > >> >> >> >> >And what makes that case 1 or case 2 is executed? >> >> >> >> > >> >> >> >> >> In case one free_cb must call >> >> >> >> >> rte_pktmbuf_free otherwise nobody will free original large & >> >> >> >> >> extended mbuf, in case two free_cb can't >> >> >> >> >> call rte_pktmbuf_free because the caller calling it is just >> >> >> >> >> rte_pktmbuf_free we need. That is to say, you >> >> >> >> >> must use the same free_cb to handle these two cases, this is my >> >> >> >> >> issue and the point you don't get. >> >> >> >> > >> >> >> >> >I think there is no need to change the free_cb API. It should work >> >> >> >> >like >> >> >> >> >this: >> >> >> >> > >> >> >> >> >- virtio creates the original external mbuf (orig_m) >> >> >> >> >- gso will create a new mbuf referencing the external buffer >> >> >> >> >(new_m) >> >> >> >> > >> >> >> >> >At this point, the shinfo has a refcnt of 2. The large buffer will >> >> >> >> >be >> >> >> >> >freed as soon as rte_pktmbuf_free() is called on orig_m and new_m, >> >> >> >> >whatever the order. >> >> >> >> > >> >> >> >> >Regards, >> >> >> >> >Olivier >> >> >> >> >> >> >> >> Oliver, the reason it works is I changed free_cb API, case 1 >> >> >> >> doesn't know orig_m, how you make it free orig_m in free_cb. >> >> >> >> The intention I change free_cb is to let it know orig_m, I saw OVS >> >> >> >> DPDK ran out out buffers and orig_m isn't freed, that is why >> >> >> >> I want to bring in this to fix the issue. Again, in case 1, nobody >> >> >> >> explicitly calls ret_pktmbuf_free(orig_m) except free_cb I changed. >> >> >> > >> >> >> >If nobody calls ret_pktmbuf_free(orig_m), it is a problem. >> >> >> >The free_cb is to free the buffer, not the mbuf. >> >> >> > >> >> >> >To me, it should work like this: >> >> >> > >> >> >> >1- virtio creates a mbuf attached to the ext buffer (the shinfo >> >> >> >placement >> >> >> > bug should be fixed) >> >> >> >2- gso create mbufs that reference the the same ext buf (by attaching >> >> >> >the >> >> >> > new mbuf) >> >> >> >3- gso must free the original mbuf >> >> >> >> >> >> This is impossible, segmented mbufs are referencing external buffer in >> >> >> original mbuf, >> >> >> how do you free it? As I said rte_gso_segment has no way to free it, >> >> >> please tell me a way if >> >> >> you know how to do this. >> >> > >> >> >As I said above, calling rte_mbuf_free(orig_m) will decrement the >> >> >reference >> >> >counters on all segments. The segments will be returned to the pool if >> >> >the >> >> >refcnt reaches 0. >> >> > >> >> >> >> >> >> >4- the PMD transmits the new mbufs, and frees them >> >> >> > >> >> >> >Whatever 3- or 4- is executed first, at the end we are sure that: >> >> >> >- all mbufs will be returned to the pool >> >> >> >- the linear buffer will be freed when the refcnt reaches 0. >> >> >> > >> >> >> >If this is still unclear, please, write a unit test like I did >> >> >> >above to show your issue. >> >> >> > >> >> >> >Regards, >> >> >> >Olivier >> >> >> > >> >> >> >> >> >> The issue is in "3- gso must free the original mbuf", >> >> >> rte_pktmbuf_free(segmented_mbus) can't do it, >> >> >> rte_gso_segment is impossible to do it, only feasible point is >> >> >> free_cb, please let me know if you have >> >> >> a better way to free original mbuf and don't impact on segmented mbufs >> >> >> in PMD. My point is you must >> >> >> have a place to call rte_pktmbuf_free(rogin_m) explicitly, otherwise >> >> >> it is impossible to return it to memory >> >> >> pool, please point out where it can be called in my user scenario. I >> >> >> don't care how it is done, I just care it can >> >> >> fix my issue, please don't hesitate and send me a patch if you can, >> >> >> thanks a lot. >> >> > >> >> >Sorry, but I don't see how I can be clearer to what I explained >> >> >in my previous answer. >> >> > >> >> >Please, *provide a simple test example* without gso/vhost, and I can help >> >> >to make it work. >> >> > >> >> > >> >> >Regards, >> >> >Olivier >> >> > >> >> > >> >> >> > >> >> >> > >> >> >> >> free_cb must handle case 1 and case 2 in the same code, for case 1, >> >> >> >> caller_m is segmented new_m, for case 2, caller_m is orig_m. >> >> >> >> >> >> >> >> loop in rte_gso_segement is handling original mbuf (this mbuf is >> >> >> >> multi-mbuf and includes multiple mbufs which are linked by next >> >> >> >> pointer), it isn't a problem at all. >> >> >> >> >> >> >> >> Please show me code how you can fix my issue if you don't change >> >> >> >> free_cb, thank you. >> >> >> >> >> >> >> >> struct shinfo_arg { >> >> >> >> void *buf; >> >> >> >> struct rte_mbuf *mbuf; >> >> >> >> }; >> >> >> >> >> >> >> >> virtio_dev_extbuf_free(struct rte_mbuf *caller_m, void *opaque) >> >> >> >> { >> >> >> >> struct shinfo_arg *arg = (struct shinfo_arg *)opaque; >> >> >> >> >> >> >> >> rte_free(arg->buf); >> >> >> >> if (caller_m != arg->mbuf) >> >> >> >> rte_pktmbuf_free(arg->mbuf); >> >> >> >> rte_free(arg); >> >> >> >> }