On Thu, Apr 26, 2018 at 5:55 PM, Laine Stump <[email protected]> wrote:
> On 04/26/2018 04:38 AM, Daniel P. Berrangé wrote: > > On Thu, Apr 26, 2018 at 08:09:47AM +0200, Christian Ehrhardt wrote: > > On Wed, Apr 25, 2018 at 11:25 PM, Laine Stump <[email protected]> > <[email protected]> wrote: > > > When an nwfilter rule sets the parameter CTRL_IP_LEARNING to "dhcp", > this turns on the "dhcpsnoop" thread, which uses libpcap to monitor > traffic on the domain's tap device and extract the IP address from the > DHCP response. > > If libpcap on the host is built with TPACKET_V3 defined, the dhcpsnoop > code's initialization of the libpcap socket fails with the following > error: > > virNWFilterSnoopDHCPOpen:1134 : internal error: pcap_setfilter: can't > remove kernel filter: Bad file descriptor > > It turns out that this was because libpcap with TPACKET_V3 defined > requires a larger buffer size than libvirt was setting (we were > setting it to 128k). Changing the buffer size to 256k eliminates the > error, and the dhcpsnoop thread once again works properly. > > Thanks to Christian Ehrhardt <[email protected]> <[email protected]> for > discovering that > buffer size was the problem. > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1547237 > Signed-off-by: Laine Stump <[email protected]> <[email protected]> > --- > src/nwfilter/nwfilter_dhcpsnoop.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/nwfilter/nwfilter_dhcpsnoop.c b/src/nwfilter/nwfilter_ > dhcpsnoop.c > index 6069e70460..62eb617515 100644 > --- a/src/nwfilter/nwfilter_dhcpsnoop.c > +++ b/src/nwfilter/nwfilter_dhcpsnoop.c > @@ -259,7 +259,7 @@ struct _virNWFilterDHCPDecodeJob { > * libpcap 1.5 requires a 128kb buffer > * 128 kb is bigger than (DHCP_PKT_BURST * PCAP_PBUFSIZE / 2) > */ > > > I just started building with the change for a few tests on this - no > results yet. > > But we are all puzzled/unsure enough on the size that I'd already ask to > modify the comment above to explain the new size. > > Maybe we should explain: > - why 128 isn't enough > - why you chose "only" 256 > - why the default size might be too big > - your size considerations for many guest scenarios > > > Okay, so I looked into this in more detail. I don't know that I'm totally > at the bottom of it, but this is what I found in libpcap: > > * The function create_ring() calls setsockopt(fd, SOL_PACKET, > PACKET_RX_RING, &req ...) to setup a ring buffer for receiving packets. > > * one of the attributes of req is tp_frame_size, and another is > tp_frame_nr. tp_frame_nr is set to the user-supplied (or default) buffer > size (for us it was 128k) / tp_frame_nr. > > * if TPACKETV3 isn't used, tp_frame_size is set to (roughly, there is some > aligning done, and a bit additional added for one ethernet header) the > user-supplied snaplen (in our case 576). So tp_frame_nr is going to be > something > 200. > > * if TPACKET3 *is* used, tp_frame_size is set to a predefined constant > MAXIMUM_SNAPLEN, which is 262144. This means that tp_frame_nr will be > 131072/262144, which is 0. > > So the result will be that setsockopt() is called requesting a ring buffer > that can read 0 frames. I didn't trace through the call to see exactly what > happens, but it obviously can't be good. (Right after that, tp_frame_nr is > also used in a call to malloc a buffer to hold pointers to frame headers, > and that of course would also fail if we ever got that far. I'm assuming we > don't. > > Why are the frames in the ring buffer so large for TPACKETV3? The code > says this: > > /* The "frames" for this are actually buffers that > * contain multiple variable-sized frames. > * > * We pick a "frame" size of 128K to leave enough > * room for at least one reasonably-sized packet > * in the "frame". > */ > > This all explains why changing the buffer size to 256k works for us - that > gets tp_frame_nr to 1 (just barely!) so there is no multiplying by 0. And > because multiple packets are received into this single "frame", we still > operate correctly. > > However, this all seems very arbitrary - there's an API for setting the > size of a buffer, but it's not defined anywhere in the API what are the > units of measure (that's not an exactly accurate description, but I'm too > tired to think of the right way to say it). pcap-int.h itself even admits > this in the comment above its definition of MAXIMUM_SNAPLEN: > > /* > * Maximum snapshot length. > * > * Somewhat arbitrary, but chosen to be: > * > * [blah blah blah details details...] > */ > #define MAXIMUM_SNAPLEN 262144 > > > Also, note that the comment where MAXIMUM_SNAPLEN is used in pcap-linux.c > says that it is 128k, but the actual definition sets it at 256k. :-/ > > This gives me 0 confidence that any value we choose would continue to be > viable in the future. > > That will help the next one stumbling over this code. > > FWIW, having just checked libpcap git history, the current 2 MB default > size has been there since 2008 ! I'm guessing we don't use that as we > don't want to consume lots of RAM when many guests are running. > > > > Yeah, I'm guessing this is why Stefan chose to set the buffer size. > > In the case of static IP (the learnIPAddress codepath) we open the pcap > socket, get a single packet that has an IP address, and then close the pcap > socket, so we don't normally have more than a few around at any time, and > the size of the buffer is a non-issue so it's left at the default. > > For the case of dhcp snooping, though, the handle remains open for the > entire lifetime of the domain (I guess in case the address is released and > re-requested and a different address is handed out the next time). So if > we're using the default 2MB buffer, there would be 2MB for each client. On > one hand, for a system with 100 clients, that's 200MB of memory. On the > other hand, any system that's miniscule in relation to the total amount of > memory (it might reduce the capacity of a very large host by one guest at > most). > > (NB: I checked this out by putting VIR_WARNs in the source at strategic > places, and found that we are actually keeping open *TWO* pcap sockets for > each domain that has CTRL_IP_LEARNING=dhcp. We should probably look into > that...) > > So if everyone is okay with that extra memory usage, I'd suggest Christian > could send his original patch (deleting the pcap_set_buffer_size()) to the > list (but including the comments from my patch about the exact error > message, Resolves: blah.bugzilla.blah, etc). > I can do so if "everyone is okay with that extra memory usage" applies. So waiting for opinions on that for now. > (Or if you don't have time, I can send it and credit you in the commit > log). > > > I recently see more and more Resolves: instead of "Fixes:" did we > change the recommended format for some tools and I missed it? > > Who is "we"? :-) The overwhelming majority of these tags in libvirt commit > logs have always been Resolves: > > laine@vhost2 ~/devel/libvirt (master*)>git log | grep "Resolves: h" | wc > 508 1017 33340 > laine@vhost2 ~/devel/libvirt (master*)>git log | grep "Fixes: h" | wc > 9 18 654 > > I do notice that the few Fixes: are for bugs on launchpad (with one > exception), and the Resolves: are all bugzilla.redhat.com and a few > opensuse.org/suse.com > -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd
-- libvir-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/libvir-list
