Re: [PATCH v2 net-next 2/4] net: add skeleton of bpfilter kernel module
Hi Alexei + netdev list, On Wed, May 02, 2018 at 09:36:02PM -0700, Alexei Starovoitov wrote: > Later bpfilter_process_sockopt() will be called from bpfilter hooks > in get/setsockopt() to pass iptable commands into umh via bpfilter.ko This is a part I'm quite heavily opposed to - at least at this point. Unless bpfilter offered something that is semantically compatible to what netfilter/iptables is currently implementing, I don't think bpfilter should be [allowed to] overriding the iptables {get,set}sockopt() calls. I appreciate that people are working on a different architecture packet filter than what we used to. I also understand that there is a need for backwards compatibility. I still think it's wrong to offer that compatibility on the {set,get}sockopt level, rather than on the "iptables command line utility replacement" level. But nevermind, you guys have a different opinion on that, on which we can agree to disagree. However, no matter what you do, the most important part from the user point of view is to make sure you don't break semantics. netfilter/iptables semantics have an intricate notion abut when which chain of which table is executed, in which order, at what particular point of the packet traversal during the network stack. The packet filtering rulesets that people have created over more than 18 years are based on those semantics. If you offer the same interface, but not that very same semantics, the packet filtering policies can an will break - and they will break so in a hidden way. To the user, it appears as if the ruleset is loaded with the assumed semantics, but in reality it isn't. Unless you can replicate those semantics 1:1, I think it is not only wrong to override the iptables sockopt interface, but it's outright dangerous. Having less matches/targets implemented than original iptables is something that I believe is acceptable (and inevitable, at least in the beginning). If somebody tries to load a related ruleset with bpfilter active, it will fail gracefully and the user can chose to not use that match/target in his ruleset, or to not use bpfilter. But if the ruleset loads but behaves different than before (because e.g. it's executed from a completely different place in the stack), that's IMHO an absolute no-go that must be avoided at all cost. If that's the case, you are actively breaking network security, rather than creating it. So I think there's only two ways to go: a) replicate the exact semantics/order of the filter/mangle/raw/... tables and their chains, both among themselves as well as in terms of ordering with other parts of the network stack, or b) not use the existing tables/chains with their pre-defined semantics but rather start new 'tables' which can then have different semantics as defined at the time of their implementation. My apologies if I misunderstood something about bpfilter. Feel free to correct me where I'm wrong. Thanks. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Mon, Feb 19, 2018 at 12:29:08PM -0500, David Miller wrote: > People with an Android phone in their pocket is using iptables, and > the overhead and performance of those rules really does matter. It > determines how long your battery life is, etc. I am not the android expert. However, I just dumped the ruleset on my Galaxy Tab S2 (Android 7.1.2 / LineageOS), and it was a whooping 91 rules across all tables. The longest chain interation I could spot was 24 rules. That's not the kind of ruleset where I would expect performance worries. And if there was, nftables is around for quite some time and would be much faster. Sure, that was just one tablet, but I wonder how much Android packet filter performance issue there are. Would be interesting to hear about those (and on whether they benchmarked against nftables). > > I can just as well ask how many millions of users / devices are > > already using eBPF or XDP? > > Every time someone connects to a major provider, they are using it. I was speaking of actual *users* as in indiiduals running their own systems, companies running their own servers/datacenter. The fact that some ISP (or its supplier) decisdes that one of my IP packets is routed via a smartnic with XDP offloading somewhere is great, but still doesn't turn me into a "user" of that technology. Not in my linke of thinking, at least. > And by in large, for system tracing and analysis eBPF is basically > a hard requirement for people doing anything serious these days. That's great, but misses the point. I was referring to usage in the context of the kernel network stack. Sorry for not being explicit enough. Also, the entire point was about "new technologies need time to be adopted widely". Doesn't matter which new kernel feature that is. Sure, one data center / hosting / "cloud" provider can quickly roll out a change in their network. But I'm referring to significant, (Linux-)industry-wide adoption. That would first include major distributions to include/enable/support the feature, and then people actually building their systems/products/software on top of those. > Please see the wonderful work by Brendan Gregg and others which has > basically made the GPL'ing of DTrace by Oracle entirely irrelevant and > our Linux's tracing infrastructure has become must more powerful and > capable thanks to eBPF. Agreed. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Mon, Feb 19, 2018 at 10:31:39AM -0500, David Miller wrote: > > Why is it practical to replace your kernel but not practical to replace > > a small userspace tool running on top of it? > > The container is just userspace components. Those are really baked in > and are never changing. never until you have to apply a bug fix for any of the many components you bake into it. I am doing this on an (at least) weekly basis for my Docker containers. That's no different from a classic Linux distribution where you update your apt/rpm packages all the time. A container that is static and cannot continuously updated with new versions for security (and other) fixes is broken by design. If some people are doing this, they IMHO have no sense of IT security, and such usage pattersn are not what kernel development should cite as primary use case (again IMHO). > This is how cloud hosting environments work. Yes, *one* particular use case. By far not every use case of Linux, or Linux packet filtering. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Mon, Feb 19, 2018 at 10:36:51AM -0500, David Miller wrote: > nftables has been proported as "better" for years, yet large > institutions did not migrate to it. In fact, they explicitly > disabled NFTABLES in their kernel config. It's like with any migration. People were using ipchains for a long time even after iptables existed. Many people simply don't care about packet filter performance. It's only a small fraction of their entire CPU workload, so probably not worth optimzing. For dedicated firewall devices, that's of course a different story. How long did it take for the getrandom() system call to be actually used by applications [even glibc!]? Or many other things that get introduced in the kernel? I can just as well ask how many millions of users / devices are already using eBPF or XDP? How many major Linux distributions are enabling and/or supporting this yet? I'm not criticizing, I'm just attempting to illustrate that technologies always take time to establish themselves - and of course those people with the biggest benefit (and knowing about it) will be the early adopters, while many others have no motivation to migrate. > In my opinion, any resistence to integration with eBPF and XDP will > lead to even less adoption of netfilter as a technology. 1) I may not have made my point clear, sorry. I have not argued against any integration with eBPF, I have just made some specific arguments against specific aspects of the current RFC. 2) You have indicated repeatedly that there are millions and millions of netfilter/iptables users out there. So I fail to see the "even less adoption" part. "Even less" than those millions and millions? SCNR. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Dear David, On Mon, Feb 19, 2018 at 10:27:27AM -0500, David Miller wrote: > > Would you be willing to merge nftables into kernel tools directory > > then? > > Did you miss the part where I explained that people explicitly disable > NFTABLES in their kernel configs in most if not all large datacenters? If people to chose to disable a certain feature, then that is their own decision to do so. We should respect that decision. Clearly they seem to have no interest in a better or more featureful packet filter, then. I mean, it's not like somebody proposes to implement NTFS inside the FAT filesystem kernel module because distributors (or data centers) tend to disable the NTFS module?! How is kernel development these days constrained by what some users may or may not put in their Kconfig? If they want a given feature, they must enable it. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Mon, Feb 19, 2018 at 10:13:35AM -0500, David Miller wrote: > Florian, first of all, the whole "change the iptables binary" idea is > a non-starter. For the many reasons I have described in the various > postings I have made today. > > It is entirely impractical. Why is it practical to replace your kernel but not practical to replace a small userspace tool running on top of it? -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Mon, Feb 19, 2018 at 09:44:51AM -0500, David Miller wrote: > I see talk about "just replacing the iptables binary". > > A long time ago in a galaxy far far away, that would be a reasonable > scheme. But that kind of approach won't work in the realities of > today. > > You aren't going to be able to replace the iptables binary in the tens > of thousands of container images out there, nor the virtualization > installations either. I appear to have been under the impression that the entire movement to DevOps and automatic provisioning of containers/nodes/pods/Vms with puppet, ansible, Dockerfiles & Co is to be *more* agile in deployments, rather than less. If you cannot even rebuild your thousands of container images with updated binaries, then what is this all worth? You need to be able to update the iptables (or any other) binary in case there's an important (security or otherwise) bug that needs fixing. I don't see this any different. Also, as long as legacy ip_tables/x_tables is still in the kernel, you can still run your old userspace against that old implementation in the kernel. Nobody forces you to use anything else [for another decade or so]. Just if you want to take advantage of new more modular/performant/... things like nftables or an eBPF backend, then you would have to go that extra mile. I don't think the kernel (network) developers should burden themselves with too many things. There's sufficient on their plate as-is. So * if there's some new system (nftables, bpfilter, ...) * and some documented migration paths for the vast majority of the use cases (replacing iptables binaries with a compat wrapper) * and the old system continues to work as-is (x_tables kernel code stays for several more years) Then people who care about the new features or performance will migrate to the new system. And those who don't care stay with the old system - which is not a problem as they clearly wouldn't need the new system anyway. > Like it or not iptables ABI based filtering is going to be in the data > path for many years if not a decade or more to come. I beg to differ. For some people, yes. but then, as Florian points out, they can just as well use the existing x_tables kernel code. If they want something better, they can either replace their iptables program with xtables-compat from nftables, or whatever else might exist for eBPF support. > iptables is a victim of it's own success, like it or not :-) Yes, the > ABI is terrible, but obviously it was useful enough for lots of > people. and it continues to do so. I just don't think it is a great idea to kludge any new packet filter against such an arcane uapi. > Therefore it behooves us to accept this reality and align the data > path generated to match what the rest of the kernel is moving towards > and that is eBPF and XDP. This argument is unrelated to the question of the uapi. I'm not arguing against an eBPF backend/implementation for packet filtering. It's more a question of _how_. > Furthrmore, on a long term maintainence perspective, it means that > every data path used by the kernel for iptables will be fully verified > by the eBPF verifier. This means that the iptables data path will be > guaranteed to never get into a loop, access out of bounds data, etc. > > That to me is real power, and something we should pursue. Once again, both not related to the question of the uapi. > I know you can't see how offloading is possible, but I hope > are some further discussion you can see how that might work. I'm looking forward to that point. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi Daniel, On Mon, Feb 19, 2018 at 01:03:17PM +0100, Daniel Borkmann wrote: > Hi Harald, > > On 02/17/2018 01:11 PM, Harald Welte wrote: > [...] > >> As rule translation can potentially become very complex, this is performed > >> entirely in user space. In order to ease deployment, request_module() code > >> is extended to allow user mode helpers to be invoked. Idea is that user > >> mode > >> helpers are built as part of the kernel build and installed as traditional > >> kernel modules with .ko file extension into distro specified location, > >> such that from a distribution point of view, they are no different than > >> regular kernel modules. > > > > That just blew my mind, sorry :) This goes much beyond > > netfilter/iptables, and adds some quiet singificant new piece of > > kernel/userspace infrastructure. To me, my apologies, it just sounds > > like a quite strange hack. But then, I may lack the vision of how this > > might be useful in other contexts. > > Thought was that it would be more suitable to push all the complexity of > such translation into user space [...] Sure, you have no complaints from my side about that goal. I'm just not sure if turning the kernel module loader into a new mechanism to start userspace processes is. I guess that's a question that the people involved with core kernel code and module loader have to answer. To me it seems like a very lng detour away from the actual topic (packet filtering). > Given normal user mode helpers make this rather painful since they > need to be shipped as extra package by the various distros, the idea > was that the module loader back end could treat umh similarly as > kernel modules and hook them in through request_module() approach > while still operating out of user space. In any case, I could image > this approach might be interesting and useful in general also for > other subsystems requiring umh in one way or another. I completely agree this approach has some logic to it. I just think the approach taken is *very* different from what has been traditionally done in the Linux world. All sorts of userspace programs to configure kernel features (iptables being one of them iproute2, etc.) have always been distributed as separate/independent application programs, which are packaged separately, etc. Making the kernel source tree build such userspace utilities and executing them in a new fashion via the kernel module loaders are to me two quite large conceptual changes on how "Linux works", and I believe you will have to "sell" this idea to many people outside the kernel networking communit, i.e. core kernel developers, people who do packaging, etc. I'm not saying I'm fundamentally opposed to it. Will be curious to see how the wider kernel community thinks of that architecture. > Right, having a custom iptables, libiptc or LD_PRELOAD approach would work > as well of course, but it still wouldn't address applications that have > their own custom libs programmed against iptables uapi directly or those > that reused a builtin or modified libiptc directly in their application. How many of those wide-spread applications are you aware of? The two projects you have pointed out (docker and kubernetes) don't. As the assumption that many such tools would need to be supported drives a lot of the design decisions, I would argue one needs a solid empircal basis. Also, the LD_PRELOAD wrapper *would* work with all those programs. Only the iptables command line replacement wouldn't catch those. > Such requests could only be covered transparently by having a small shim > layer in kernel and it also wouldn't require any extra packages from distro > side. What is wrong with extra packages in distributions? Distributions also will have to update the kernel to include your new code, so they could at the same time use a new iptables (or $whatever) package. This is true for virtually all new kernel features. Your userland needs to go along with it, if it wants to use those new features. > > Some of those can be implemented easily in BPF (like recomputing the > > checksum or the like). Some others I would find much more difficult - > > particularly if you want to off-load it to the NIC. They require access > > to state that only the kernel has (like 'cgroup' or 'owner' matching). > > Yeah, when it comes to offloading, the latter two examples are heavily tied > to upper layers of the (local) stack, so for cases like those it wouldn't > make much sense, but e.g. matches, mangling or forwarding based on packet > data are obvious candidates that can already be offloaded today in a > flexible and programmable manner all with existing BPF infra, so for those > it could definitely be highly interesting to m
Re: [PATCH RFC 0/4] net: add bpfilter
ctionality traditionally done > by kernel modules into user space processes (either root or !root) and > reduces security attack surface of such new code, meaning in case of > potential bugs only the umh would crash but not the kernel. Another > advantage coming with that would be that bpfilter.ko can be debugged and > tested out of user space as well (e.g. opening the possibility to run > all clang sanitizers, fuzzers or test suites for checking translation). > Also, such architecture makes the kernel/user boundary very precise, > meaning requests can be handled and BPF translated in control plane part > in user space with its own user memory etc, while minimal data plane > bits are in kernel. I understand that it has advantages to have the compiler in userspace. But then, why first send your rules into the kernel and back? > In the implemented proof of concept we show that simple /32 src/dst IPs > are translated in such manner. Of course this is the first that one starts with. However, as we all know, iptables was never very good or efficient about 5-tuple matching. If you want a fast implementation of this, you don't use iptables which does linear list iteration. The reason/rationale/use-case of iptables is its many (I believe more than 100 now?) extensions both on the area of matches and targets. Some of those can be implemented easily in BPF (like recomputing the checksum or the like). Some others I would find much more difficult - particularly if you want to off-load it to the NIC. They require access to state that only the kernel has (like 'cgroup' or 'owner' matching). > In the below example, we show that dumping, loading and offloading of > one or multiple simple rules work, we show the bpftool XDP dump of the > generated BPF instruction sequence as well as a simple functional ping > test to enforce policy in such way. Could you please clarify why the 'filter' table INPUT chain was used if you're using XDP? AFAICT they have completely different semantics. There is a well-conceived and generally understood notion of where exactly the filter/INPUT table processing happens. And that's not as early as in the NIC, but it's much later in the processing of the packet. I believe _if_ one wants to use the approach of "hiding" eBPF behind iptables, then either a) the eBPF programs must be executed at the exact same points in the stack as the existing hooks of the built-in chains of the filter/nat/mangle/raw tables, or b) you must introduce new 'tables', like an 'xdp' table which then has the notion of processing very early in processing, way before the normal filter table INPUT processing happens. > Feedback very welcome! Thanks. Despite being a former netfilter core team member, I'm trying to look at this as neutral as possible. So please don't perceive my comments as overly defensive or the like. My main points are: 1) What is the goal of this? 2) Why iptables and not nftables? 3) If something looks like existing iptables, it must behave *exactly* like existing iptables, otherwise it is prone to break users security in subtle and very dangerous ways. Looking forward to the following discussion and on other points of view. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi Daniel, On Fri, Feb 16, 2018 at 09:44:01PM +0100, Daniel Borkmann wrote: > We started out with the > iptables part in the demo as the majority of bigger infrastructure projects > all still rely heavily on it (e.g. docker, k8s to just name two big ones). docker is exec'ing the iptables command line program. So one could simply offer a syntactically compatible userspace replacement that does the compilation in userspce and avoid the iptables->libiptc->setsockopt->userspace roundtrip and the associated changes to the kernel module loader you introduced. kubernetes is using iptables-restore, which is part of iptables and again has the same syntax. However, it aovids the per-rule fork+exec overhead, which is why the netfilter project has been recommending it to be used in such situations. Do you have a list of known projects that use the legacy sockopt-based iptables uapi directly, without using code from the iptables.git codebase (e.g. libiptc, iptables or iptables-restore)? IMHO only those projects would benefit from the approach you have taken vs. an approach that simply offers a compatible commandline syntax. > Usually they have their requests to iptables baked into their code directly > which probably won't change any time soon, so thought was that they could > benefit initially from it once there would be sufficient coverage. If the binary offeers the same syntax (it could even be a fork/version of the iptables codebase, only using the parsing without the existing backend generating the ruleS), the same goal could be achieved. The above of course assumes that you have a 100% functional replacement (for 100% of the features that your use cases use) underneath the "iptables command syntax" compatibility. But you need that in both cases, whether you use the existing userspace api or not. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH RFC 0/4] net: add bpfilter
Hi David, On Fri, Feb 16, 2018 at 05:33:54PM -0500, David Miller wrote: > From: Florian Westphal <f...@strlen.de> > > > Any particular reason why translating iptables rather than nftables > > (it should be possible to monitor the nftables changes that are > > announced by kernel and act on those)? > > As Daniel said, iptables is by far the most deployed of the two > technologies. Therefore it provides the largest environment for > testing and coverage. As I outlined earlier, this way you are perpetuating the architectural mistakes and constraints that were created ~ 18 years ago without any benefit from the lessons learned ever since. In netfilter, we already wanted to replace it as early as 2006 (AFAIR) with nfnetlink based pkttables (which never materialized). I would strongly suggest to focus on nftables (or even some other way of configuration / userspace interaction) to ensure that the iptables userspace interface can at some point be phased out eventually. Like we did with ipchains before, and before that with ipfwadm. By making a new implementation dependant on the oldest interface you are perpetuating it. Sure, one can go that way, but I would suggest this to be a *very* carefully weighed decision after a detailed analysis/discusison. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH v7 net-next 00/13] gtp: Additional feature support - Part I
Hi Tom, sorry for the delayed response. But I remain committed in pushing the non-controversial part of your GTP patches forward. On Sat, Oct 28, 2017 at 06:47:59PM +0200, Harald Welte wrote: > Thanks. As indicated, I'm planning some testing later this weekend on > the non-IPv6 patches, and am happy to add my Acked-by and/or re-submit > those to Dave after that. After some more delays and returning from netdev 2.2, I've finally put together a testing setup and successfully (manually) tested with the following patches: 01/13 vxlan: Move gro_cells_init to ndo_init 02/13 iptunnel: Add common functions to get a tunnel route 04/13 gtp: Call common functions to get tunnel routes and add dst_cache 05/13 iptunnel: Generalize tunnel update pmtu 06/13 gtp: Change to use gro_cells 07/13 gtp: Use goto for exceptions in gtp_udp_encap_recv funcs 08/13 gtp: udp recv clean up 09/13 gtp: Call function to update path mtu 10/13 gtp: Eliminate pktinfo and add port configuration I hereby acknowledge those patches. How should we proceed? Should I a) do nothing, you will add Acked-By and re-submit? b) send an individual Acked-By in a reply to each related patch here on netdev and you will re-submit those patches? c) simply create a rebased set from those patches and re-submit them to the list for net-next myself, with the Acked-by? d) be preposterous and provide a gtp git tree for DaveM to pull from? As discussed before, I will not merge/ack IPv6 will until we have an implementation that is interoperable. I have a TODO list of other bugfixes and improvements for Kernel GTP, but I'm hopeful that IPv6 can still be addressed before the end of 2017. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH v7 net-next 00/13] gtp: Additional feature support - Part I
Hi Tom, On Sat, Oct 28, 2017 at 09:16:01AM -0700, Tom Herbert wrote: > Here is what the Kconfig for the EXPERIMENTAL option says: > > "This is an experimental implementation that allows encapsulating IPv6 > over GTP and using GTP over IPv6 for testing and development purposes. > This is not a standards conformant implementation for IPv6 and GTP. > More work is needed to reach that level." > > I don't see any ambiguity here about it not being standards complete. > Nor is there any ambiguity about the its purpose to enable further > development and the fact that more work is needed. As stated repeatedly: I have no issue with an *incomplete* implementation, but I have a problem with an *incompatible* one that takes left turns where the spec takes right turns. An *incomplete* implementation could still interoperate with other implementations but is e.g. missing some optional bits. It can later extended with those missing bits and wills stay compatible to users of the incomplete as well as to any later complete implementation. An *incompatible* implementation is what I have issues with. > This a foundation for an IPv6 datapath and is sufficient to do > benchmarking and performance to determine the prospects of replacing > proprietary HW with commodity servers running Linux kernel. I completely agree with that. > This is a forward step to get IPv6 into GTP, and frankly the _only_ code that > has been proposed. There is no reason why someone can't build upon > this to make a first rate conformant implementation. I also agree with this. However, I don't think it makes sense to have it in the kernel given it implements something that's actually not GTP as per the relevant specs. No matter how many disclaimers you put at it, people will still assume it's GTP if it's called GTP. And if it's only useful for benchmarking the poential of a later proper IPv6 implementation, I don't think it should go in. > In any case, I've invested as much time in this as I can for now. I'll > leave it up to DaveM to decide if we wants to take all, none, or some > subset of these patches. Thanks. As indicated, I'm planning some testing later this weekend on the non-IPv6 patches, and am happy to add my Acked-by and/or re-submit those to Dave after that. For sure, the kernel networking maintainer can merge any patches, including the proposed IPv6 patches as-is, and I will accept that. But my vote as the original author and co-maintainer of the kernel GTP code goes politely and respectfully against that - as I have made quite clear by now. Thanks + Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH v7 net-next 00/13] gtp: Additional feature support - Part I
Hi Tom, my apologies for only getting back to reviews now, after return from holidays I was too overloaded with plenty of other tasks. On Fri, Oct 27, 2017 at 05:09:24PM -0700, Tom Herbert wrote: > - Experimental IPv6 support As far as I can tell, my previous comments/concerns regarding an IPv6 support that is in clear violation of how it is specified is not acceptable to me, sorry. The question is - as illustrated quite verbosely before - not one of missing certain bits, but it is simply *different* from what the protocol specification says. This makes absolutely no sense to me. All it will do, is it will raise the impression that IPv6 is supported in a spec-compliant way. Furthermore, it will mean that once somebody does convert this into proper IPv6 support later, they will break the existing use cases of the non-compliant implementation that you're adding in this patch series. To me, I simply don't understand how that makes any sense. There are no known users of GTP outside of 3GPP networks, and particularly none of a different GTP flavour than the one specified in 3GPP. If there are, I would want to hear of it. And if there really are, I would strongly recommend them to use some other tunneling protocol, one that's more reasonable for normal use cases and with better protocol-level security. I wouldn't mind merging *incomplete* IPv6 support. However, I do mind merging *incompatible* or *incompliant* IPv6 support. > - Configurable networking interfaces so that GTP kernel can be > used and tested without needing GSN network emulation (i.e. no user > space daemon needed). > - Addition of a dst_cache in the GTP structure and other cleanup No issues with this from my side. I plan on setting up a test system with your patches over the weekend and give it some more testing from the use cases I know to avoid regressions. > - Common functions to get a route fo for an IP tunnel > - Fix VXLAN gro cells initialization Also no issues from my side, but that's for core networking folks to decide. > For IPv6 support, the mobile subscriber needs to allow IPv6 addresses, > and the remote endpoint can be IPv6. You are raising the impression - again - that this implementation will work with any mobile subscriber. I would bet at lot on the fact that the current IPv6 implementation will in fact *not* work with any existing equipment/devices out there. > Tested: > > Configured the matrix of IPv4/IPv6 mobile subscriber, Please indicate how that testing was done, see my comment above. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH v4 net-next 00/12] gtp: Additional feature support - Part I
Hi Tom, thanks for your updated series! I'll revie as soon as I a, but that may likely not be before next week. As indicated before, I'm on a motorbike roadtrip on vacation, with very limited connectivity and even more limited time for any technical work. Thanks for your understanding. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
Hi Tom, On Sun, Sep 24, 2017 at 08:55:49AM -0700, Tom Herbert wrote: > Do you believe that these patches are not at all on the right track, > that they can't be built upon to get to a standards-compliant > implementation, and that we are going to have to throw all of this and > start from scratch to provide IPv6 support? I believe I have pointed out where the problem areas are, several times by now. I see no reason why things would have to be started from scratch. However, the issues pointed out in the IPv6 support patch[es] have to be resolved *before* any merge to mainline. I don't mind merging "incomplete" code that doesn't cover all parts of a spec but provides basic interoperability. I also am not arguing that code must be bug-free at the time it is merged (which is impossible anyway). But I am arguing that we cannot merge something that is a wrong implementation as per the spec, and hence it must be brought in-line with the spec before it can be merged. > > There's no use in merging an IPv6 support patch if already by code > > review it can be shown that it's impossible to create a spec-compliant > > implementation using that patch. To me, that would be "merging IPv6 > > support so we can check off a box on a management form or marketing > > sheet", but not for any practical value. > > To be clear, these patches are not done because to be a bullet point > on a marketing sheet. Great. > IPv6 is becoming _the_ Internet protocol. I'm all aware of that, and I've been a very early adopter, since the 1990ies with 6bone. My argument is not against IPv6 support. My argument is against merging something that introdues IPv6 in a way that's not in-line with the GTP protocol specifications, as such a way is of no use to anyone (except marketing sheets). > We should be far past the days of vendors only providing IPv4 in the > kernel support because "that's what our customers use" and they'll get > to IPv6 support at their leisure. I'm not sure where a "vendor" is involved with the GTP patches so far. I think we have to draw a distinction between what you expect from professional, corporate "vendors" with a commercial interest in mind (such as supporting their hardware) and what you can expect from people doing things in their spare time, out of enthusiasm to finally bring some Free Software into the closed world of telecommunications. The Telecom world should have implemented something like a GTP kernel module a decade to 15 years ago. They could have saved significant investments in proprietary hardware by running open source GGSNs with an accelerated user plane in the kernel. Nobody seemed to have an interest in that, until today - as you can see from Pablo and me working on this in our spare time, whenever we have a couple of spare cycles next to many other projects. You can see from the osmo-gtp-kernel commit log it took years of being a ultra-low-priority on-and-off project to ever get to a point where we thought it was worth submitting it mainline. Andreas deserves the praise for finally pushing it ahead. I'm looking forward to reviewing the next version of the patch series. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
osmocom.org/osmo-gtp-kernel/commit/?id=3bc7019c7afd06b5c7d94e5621728d092b82bb85 it was actually intended to make IPv6 support easier, but the partial IPv6 support was removed before mainline submission. * 07/14 Support encapsulation of IPv6 packets Not acceptable in its current form, see extensive review * 08/14 Support encpasulating over IPv6 No concerns in principle. Pending you making it dependent on AF of socket * 09/14 Allow configuring GTP interface as standalone Can be merged unless strong objection from Pablo/Andreas (see above) * 10/14 Add support for devnet No concerns from my side * 12/14 Configuration for zero UDP checksum Up to Dave, he raised a question on it * 13/14 Support for GRO No concerns from my side * 14/14 GSO support No concerns from my side BTW: Where have the iproute2/ip patches been posted, which you mention in your cover page of the patch series? -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 03/14] gtp: Call common functions to get tunnel routes and add dst_cache
Hi Andreas, On Wed, Sep 20, 2017 at 05:37:52PM +0200, Andreas Schultz wrote: > I think we had this discussion before. The sending IP and port are not part > of the identity of the PDP context. So IMHO the sender is permitted > to change the source IP at random. Thanks for the reminder: You are correct, at least in the uplink case (MS->GGSN) where there is mobility of the MS. In the downlink case (GGSN->MS), which is the "sending" part for the kernel GTP code used at a GGSN, I'm not sure if that theory holds true in reality. Do you agree that the current behavior of not using automatic source address selection for encapsulated GTP packets but rather using the source address of the socket is intended? Do you further agree that the dst_cache support patch by Tom retains that intended behavior and it should be merged? -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
he test itself (GGSN_Test) executing against the IUT The code so far implements PDP context activation + address allocation for IPv4-only and IPv6-only cases and can be run against a GGSN implementing those. The IPv6-only PDP context unit tests include the convoluted two-phase address assignment including sending the router solicitation from the simulated phone as well as verifying the router advertisement sent in response from the GGSN. Yes, I know they're written in an unknown niche programming language called TTCN-3, but this was the best tool at hand for the job I could find, and the tests are open source as is the Eclipse Titan toolchain for compiling it. We even have a Dockerfile that will build you a docker container containing the compiled GGSN_test at http://git.osmocom.org/docker-playground/tree/ggsn-test That docker container is used by a jenkins build test job to test current OsmoGGSN master every night against the test suite. I was stupid enough to break the testsuite with an accidential commit, so tests between August 27th and today failed. I've just reverted that accident - don't let that mistake of mine mislead you. I'm happy to contribute further to this by adding actual user-IP GTP-U functional testing beyond the router soliciatation/advertisement to it. So my suggestion in terms of a "realistic testbed without having to configure + run dozens of programs and using real RF + phones" is to use that testsuite. What one cannot get around is having to implement support for new features added on the kernel side such as IPv6 in libgtpnl and at least one of the GGSN's using it, sorry. Without that, there is no way to know if that code would do anything useful. You simply cannot realistically test GTP-U alone without GTP-C. I'd love to offer help on this, but it's really impossible right now. I'm on holidays on a motorbike tour through rural Taiwan's mountains, had to deal with a flat tire today, have limited connectivity and am already cutting down hard on sleep every night to be able to respond to the absolute minimally required work e-mails. And review/follow-up to your much appreciated patch series the last couple of days has also used a lot of (unexpected not scheduled for) time. I'm not complaining, I'm just saying I am really not able to contribute more to this effort right now beyond my review, the offer of free hardware for a real cellular network, and the extension of the test cases for GTP-U beyond the already implemented very important IPv6 address allocation/assignment which I believe your current code would not pass. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 00/14] gtp: Additional feature support
Hi Tom, On Tue, Sep 19, 2017 at 04:47:11PM -0700, Tom Herbert wrote: > On Tue, Sep 19, 2017 at 4:19 PM, Harald Welte <lafo...@gnumonks.org> wrote: > > > I think there has to be a clear plan/architecture on how to implement > > those bits in terms of the kernel/userspace split, and at least a proof > > of concept implementation that we can show works with some real phones > > out there - otherwise there's no point in having IPv6 support that works > > well with some custom tools. > > > OTOH, I will argue that the GTP patches should never have been allowed > in the kernel in the first place without IPv6 support! ;-) Well, it could be shown that the code works with integration to OpenGGSN (and later ergw, and now OsmoGGSN) and works within a complete 3GPP network. So we were not merging something that we hypothesized it would work once the rest would be implemented, but we could actually show it was working before it got merged. > I think the best plan forward is to get the IPv6 data path running > that so can demonstrate a functional GTP/IPv6 datapath Yes, but a functional "datapath" (= GTP-U) unfortunately includes the way how address allocation/assignment is done. Putting something into the mainline kernel that we know will for sure not work/interop in a real scenario is not a good idea. I think what's worse than not supporting a feature is to implying support for it while actually doing it in an incomplete/incompliant way. So from my point of view, what's needed is * making sure router advertisement/solicitation are covered in some way, either by doing it in the kernel or having a clear strategy how it could be done from userspace while not a) introducing races regarding who owns the sequence numbers * making sure the implementation covers entire /64 prefixes for each PDP context and not single addresses. That is non-negotiable and mandatory by 3GPP specs. > Since "real" configuration path doesn't use the path to set up a > standalone interface, I would presume that that will be fleshed when > someone has cycles and expertise to work on both sides of the problem. The configuration will be different, yes. but we need to ensure that the actual *implementation* of the data path does what it is expected to do, no matter who configures it via which interface. > Even if this requires structural changes to how IPv6 is managed in > GTP, I doubt that the fundamental TX/RX, GRO/GSO data paths will > change much. In other words, please consider this to be a step on an > evolutionary path. More work is required to reach the ultimate > deployable solution. Agreed. But then I'm still against merging something that we know for sure will not be compatible with real-world use case. It should be kept out of mainline until we are sure of that, at the very least theoretically, but even better which we can prove in practise will do what it claims to do. > As for testing on real phones, that is cannot be a requirement for a > kernel feature. GTP is not implemented on phones. GTP is implemented only inside the fixed (land-side) of the cellular network. However, the inner IP data originates from phones, and large parts of what you see on GTP originates from phones in a different format. The inner IP data inside GTP-U originates from phones. And that's where address configuration for IPv6 works. > If you expect Linux community to support this, then we > need a way to be develop and test on commodity PC hardware. I'm not arguing you need to run any of the code on a different architecture such as a phone. I'm arguing for you to run a cellular network including the kernel GTP code, and then use that cellular network from a real phone. This is the only way to know for sure you interoperate. See my other mail related to the Open Soruce based configurations for both 2G and 3G that can be used for this. As a replacement, one can e.g. look at protocol traces of real phones and then simulate the behavior of one or several different phones and implement that as test caess. This is what I did in the GGSN_Test pointed out in my other mail in this thread. And btw, all I've asked is for showing it works with *one* phone model at all. I'm not talking about the various different implementation specifics, such as whether or not the phone will insist on using neighbor solicitation and mandate neighbor advertisement (on a point-to-point link, how absurd!) after the (mandatory) router discovery. > That is one of the major values of creating a standalone interface > configuration-- we can test the datapath just like any other > encapsulation supported by the kernel. Well, you cannot. You might be able to do some benchmarking to compare if an old version of the kernel gtp driver will perform better or worse than some optimizations introduced. But you can *not* have a rea
Re: [PATCH net-next 12/14] gtp: Configuration for zero UDP checksum
Hi Tom, On Wed, Sep 20, 2017 at 11:09:29AM -0700, Tom Herbert wrote: > On Mon, Sep 18, 2017 at 9:24 PM, David Miller <da...@davemloft.net> wrote: > > From: Tom Herbert <t...@quantonium.net> > >> Add configuration to control use of zero checksums on transmit for both > >> IPv4 and IPv6, and control over accepting zero IPv6 checksums on > >> receive. > > > > I thought we were trying to move away from this special case of allowing > > zero UDP checksums with tunnels, especially for ipv6. > > I don't have a strong preference either way. I like consistency with > VXLAN and foo/UDP, but I guess it's not required. Interestingly, since > GTP only carries IP, IPv6 zero checksums are actually safer here than > VXLAN or GRE/UDP. Just for the record: I don't care either way and I defer to the kernel networking developers to decide if they want to have zero UDP checksum in GTP or not. The 3GPP specs don't say anything about UDP checksums. So there's no requirement to use them, and hence operation without UDP checksums should be compliant. Cisco GTP implementation has udp checksumming configurable, so other implementations also seem to provide both ways. In general, I would argue one wants UDP checksumming of GTP in all setups, as while the inner IP packet might be protected, the GTP header itself is not, and that's what contains important data suhc as the TEID (Tunnel Endpoint ID). But that's of course just my personal opinion, and I'm not saying we should prevent people from using lower protection if that's what they want. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
Hi Tom, On Wed, Sep 20, 2017 at 01:40:54PM -0700, Tom Herbert wrote: > On Wed, Sep 20, 2017 at 12:45 PM, David Miller <da...@davemloft.net> wrote: > > There is a socket associated with the tunnel to do the encapsulation > > and it has an address family, right? > > If fd's are set from userspace for the sockets then we could derive > the address family from them. I'll change that. Although, looking at > now I am wondering why were passing fds into GTP instead of just > having the kernel create the UDP port like is done for other encaps. because the userspace process has to take care of those bits of GTP-U that the kernel doesn't, such as responding to GTP ECHO requests with GTP echo responses. Only the "GTP Message type G-PDU" is handled in the kernel, as only those frames contain user plane. See table 1 of Section 7.1 of 3GPP TS 29.060. If you create the socket in the kernel, how would you hand the socket to the userspace process later on? IMHO, it feels more natural to simply create it in userspace (like you would do in the non-kernel-accelerated case) and then simply handle the G-PDU messages in the kernel while doing the rest in userspace. But if there's another method that feels more usual to the kernel community, I'm not against any changes - but given kernel policies, we'd have to keep userspace compatbility, right? Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone
Hi Tom, On Wed, Sep 20, 2017 at 09:24:07AM -0700, Tom Herbert wrote: > On Wed, Sep 20, 2017 at 9:07 AM, Andreas Schultz <aschu...@tpip.net> wrote: > > GTP isn't special, I just don't like to have testing only features in there > > when the same goal can be reached without having to add extra stuff. Adding > > code that is not going to be useful in real production setups (or in this > > case would even break production setups when enabled accidentally) makes the > > implementation more complex than it needs to be. > > Well, you could make the same argument that allowing GTP to configured > as standalone interface is a problem since GTP is only allowed to be > with used with GTP-C. But, then we have something in the kernel that > the community is expected to support, but requires jumping through a > whole bunch of hoops just to run a simple netperf. "A whole bunch of hoops" without your new interface would consist of running a single command-line program that is supplied with libgtpnl. This is not a complete 3GPP network, but a simple libmnl-based helper library with no other depenencies. I'm not neccessarily against introducing features like the 'standalone interface configuration'. However, we must make sure that any significant new feature contributions like IPv6 are tested in a "realistic setup" and not just using those 'interfaces added for easy development'. Also, I would argue those 'interfaces added for easy deveopment/benchmarking' should probably be clearly marked as such to avoid raising the impression that this is what leads to a standard-conforming / production-type setup. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 00/14] gtp: Additional feature support
Hi Tom, On Tue, Sep 19, 2017 at 08:59:28AM -0700, Tom Herbert wrote: > On Tue, Sep 19, 2017 at 5:43 AM, Harald Welte <lafo...@gnumonks.org> > wrote: > > On Mon, Sep 18, 2017 at 05:38:50PM -0700, Tom Herbert wrote: > >> - IPv6 support > > > > see my detailed comments in other mails. It's unfortunately only > > support for the already "deprecated" IPv6-only PDP contexts, not the > > more modern v4v6 type. In order to interoperate with old and new > > approach, all three cases (v4, v6 and v4v6) should be supported from > > one code base. > > > It sounds like something that can be subsequently added. Not entirely, at least on the netlink (and any other configuration interface) you will have to reflect this from the very beginning. You have to have an explicit PDP type and cannot rely on the address type to specify the type of PDP context. Whatever interfaces are introduced now will have to remain compatible to any future change. My strategy to avoid any such possible 'road blocks' from being introduced would be to simply add v4v6 and v6 support in one go. The differences are marginal (having both an IPv6 prefix and a v4 address in parallel, rather than mutually exclusive only). > Do you have a reference to the spec? See http://osmocom.org/issues/2418#note-7 which lists Section 11.2.1.3.2 of 3GPP TS 29.061 in combination with RFC3314, RFC7066, RFC6459 and 3GPP TS 23.060 9.2.1 as well as a summary of my understanding of it some months ago. > >> - Configurable networking interfaces so that GTP kernel can be > >> used and tested without needing GSN network emulation (i.e. no > >> user space daemon needed). > > > > We have some pretty decent userspace utilities for configuring the > > GTP interfaces and tunnels in the libgtpnl repository, but if it > > helps people to have another way of configuration, I won't be > > against it. > > > AFAIK those userspace utilities don't support IPv6. Of course not [yet]. libgtpnl and the command line tools have been implemented specifically for the in-kernel GTP driver, and you have to make sure to add related support on both the kernel and the userspace side (libgtpnl). So there's little point in adding features on either side before the other side. There would be no way to test... > Being able to configure GTP like any other encapsulation will > facilitate development of IPv6 and other features. That may very well be the case, but adding "IPv6 support" to kernel GTP in a way that is not in line with the existing userspace libraries and control-plane implementations means that you're developing those features in an artificial environment that doesn't resemble real 3GPP interoperable networks out there. As indicated, I'm not against adding additional interfaces, but we have to make sure that we add IPv6 support (or any new feature support) to at least libgtpnl, and to make sure we test interoperability with existing 3GPP network equipment such as real IPv6 capable phones and SGSNs. > > I'm not sure if this is a useful feature. GTP is used only in > > operator-controlled networks and only on standard ports. It's not > > possible to negotiate any non-standard ports on the signaling plane > > either. > > > Bear in mind that we're not required to do everything the GTP spec > says. Yes, we are, at least as long as it affects interoperability with other implemetations out there. GTP uses well-known port numbers on *both* sides of the tunnel, and you cannot deviate from that. There's no point in having all kinds of feetures in the GTP user plane which are not interoperable with other implementations, and which are completely outside of the information model / architecture of GTP. In the real world, GTP-U is only used in combination with GTP-C. And in GTP-C you can only negotiate the IP address of both sides of GTP-U, and not the port number information. As a result, the port numbers are static on both sides. > My impression is GTP designers probably didn't think in terms of > getting best performance. But we can ;-) I think it's wasted efforts if it's about "random udp ports" as no standards-compliant implementation out there with which you will have to interoperate will be able to support it. GTP is used between home and roaming operator. If you want to introduce changes to how it works, you will have to have control over both sides of the implementation of both the GTP-C and the GTP-u plane, which is very unlikely and rather the exception in the hundreds of operators you interoperate with. Also keep in mind that there often are various "middleboxes" that will suddenly have to reflect your changes. That starts from packet filters at various locations in the operator networks and/or roaming hubs, down to GTP hubs a
Re: [PATCH net-next 00/14] gtp: Additional feature support
of v4v6 it works differently: * MS/UE requests dynamic or fixed IPv4 address in EUA IE of PDP context activation * GGSN responds with an IPv6 address, but that address is *not* used for communication, but simply used as an "interface identifier" to build a link-local address. * MS then uses router solicitation using that link-local address * GGSN responds with router advertisement, allocating a single /64 prefix, from which the MS then generates a fully-qualified IPv6 source address for communication. How did you envision this to be done with the v6 support you just added? At the very least, the /64 prefix matching would have to be implemented so that in fact all addresses within that /64 prefix are matched + encapsulated for a given PDP context in the downlink (to phone) direction. Also, I think the responsibility for the router advertisements would be in the kernel, too. Otherwise, a GTP-C userspace implementation would have to inject packets into the user plane (which is otherwise handled completely inside the kernel). Injecting packets would mean that in caes GTP sequence numbers are used, that userspace implementation would have to alter the sequence numbers of the kernel gtp.ko code using netlink, but therre would be race conditions, ... The router advertisements and neighbor advertisements basically have the semantics of one link per PDP context. Each of them is a point-to-point link, and it's not one router advertisement that's sent to all of the PDP contexts on that gtp-device. I know it all sucks. I'm still happy to see somebody tackling v6 support in gtp.c :) Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6
Hi Tom, On Mon, Sep 18, 2017 at 05:38:58PM -0700, Tom Herbert wrote: > Allow peers to be specified by IPv6 addresses. > + u16 peer_af; > + union { > + struct in_addr peer_addr_ip4; > + struct in6_addr peer_addr_ip6; > + }; this will not really work, as an union means that a PDP context will be either IPv4-only or IPV6-only, while in reality there are three types, see my other mail. So you have to deal with v4-only, v6-only or v4v6. The v6-only is legacy by now, and all modern phones I've tested in recent years can do v4v6 rather than having a v4-only and a v6-only PDP context in parallel. >From the operator point of view, v4v6 is very desirable, as it basically halves the amount of PDP contexts compared to the old approach, which significantly reduces signalling load across your network, as well as the amount of memory (and thus capacity) in your core network elements. I've recently implemented v6 + v4v6 support in osmo-ggsn (see http://git.osmocom.org/osmo-ggsn/) in case you would like to see another FOSS implementation for v6 + v4v6 - though in userspace, of course. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 13/14] gtp: Support for GRO
On Mon, Sep 18, 2017 at 05:39:03PM -0700, Tom Herbert wrote: > Populate GRO receive and GRO complete functions for GTP-Uv0 and v1. looks fine to me, though I'm not the GRO expert here. Let's say what the netdev gurus have to say in their review. If you say it is tested with GRO-capable and non-GRO capable device drivers, I'm fine with the patch. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 07/14] gtp: Support encapsulation of IPv6 packets
Hi Dave, On Mon, Sep 18, 2017 at 09:19:08PM -0700, David Miller wrote: > > +static inline u32 ipv6_hashfn(const struct in6_addr *a) > > +{ > > + return __ipv6_addr_jhash(a, gtp_h_initval); > > +} > > I know you are just following the pattern of the existing "ipv4_hashfn()" here > but this kind of stuff is not very global namespace friendly. Even simply > adding a "gtp_" prefix to these hash functions would be a lot better. I would agree if this was an inline function defined in a header file or a non-static function. But where is the global namespace concern in case of static inline functions defined and used in the same .c file? If it makes you happy, I'm all for adding the prefix - I just would like to understand the rationale better, thanks :) Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 04/14] gtp: udp recv clean up
Hi Tom, I think this patch does too many things at once: * introduce separate rx functions * convert from netif_rx to gro_cells_receive * cosmetic changes like "return -1" to "goto drop" In the context of reviewability and the "one patch per topic", I would prefer to see those separated, thanks. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 03/14] gtp: Call common functions to get tunnel routes and add dst_cache
Hi Dave, On Mon, Sep 18, 2017 at 09:17:51PM -0700, David Miller wrote: > This and the new dst caching code ignores any source address selection > done by ip_route_output_key() or the new tunnel route lookup helpers. > > Either source address selection should be respected, or if saddr will > never be modified by a route lookup for some specific reason here, > that should be documented. The IP source address is fixed by signaling on the GTP-C control plane and nothing that the kernel can unilaterally decide to change. Such a change of address would have to be decided by and first be signaled on GTP-C to the peer by the userspace daemon, which would then update the PDP context in the kernel. So I guess you're asking us to document that rationale as form of a source code comment ? -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 07/14] gtp: Support encapsulation of IPv6 packets
On Mon, Sep 18, 2017 at 05:38:57PM -0700, Tom Herbert wrote: > Allow IPv6 mobile subscriber packets. This entails adding an IPv6 mobile > subscriber address to pdp context and IPv6 specific variants to find pdp > contexts by address. Please note that there are three different PDP contexts for IP: * IPv4 only (what gtp.c implements so far) * IPv6 only * dual IPv4+IPv6 (called IPv46) This information will have to be provisioned by the control plane via netlink for each PDP context. The kernel module then needs to make sure that on a v4-only context no IPv6 packets are accepted and vice-versa. Your proposed patch is missing this kind of screening function and I would imagine it could introduce all kinds of security problems :/ -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next 05/14] gtp: Remove special mtu handling
Hi Tom, On Mon, Sep 18, 2017 at 05:38:55PM -0700, Tom Herbert wrote: > Removes MTU handling in gtp_build_skb_ip4. This is non standard relative > to how other tunneling protocols handle MTU. The model espoused is that > the inner interface should set it's MTU to be less than the expected > path MTU on the overlay network. Path MTU discovery is not typically > used for modifying tunnel MTUs. The point of the kernel GTP module is to interoperate with existing other GTP implementations and the practises established by cellular operators when operating GTP in their networks. While what you describe (chose interface MTU to be less than the expected path MTU) is generally best practise in the Linux IP/networking world, this is not generally reflected in the cellular universe. You see quite a bit of GTP fragmentation due to the fact that the transport network simply has to deal with the MTU that has been established via the control plane between SGSN and MS/UE, without the GGSN even being part of that negotiation. Also, you may very well have one "gtp0" tunnel device at the GGSN, but you are establishing individual GTP tunnels to dozesn to hundreds of different SGSNs at operators all over the world. You cannot reliably set the "gtp0" interface MTU to "the path MTU of the overlay network", as the overlay network is in fact different for each of the SGSNs you're talking to - and each may have a different path MTU. So unless I'm missing something, I would currently vote for staying with the current code, which uses the path MTU to the specific destination IP address (the SGSN). Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v1 1/3] gtp: refactor to support flow-based gtp encap and decap
Hi Jiannan, > > > gtp = netdev_priv(dev); > > > + gtp->net = src_net; > >· > > Isn't this a generic change that's independent of your work on OVS GTP? > > It is meant to be OVS independent. What makes it not? Should I leave > this field un-initialized? In general, in all FOSS projects I have worked (and particularly the Linux kernel), it is a strict rule that any given patch adresses only one logical change. So if your change is for flow-based "OVS" support in the GTP code, and the "gtp->net = src_net" is a generic change (and not something specifically required by flow/OVS) then it should be a separate patch. Similarly to the cosmetic changes which should be a separate patch. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device
Hi Jiannan, On Fri, Jul 14, 2017 at 01:01:34AM +, Jiannan Ouyang wrote: > > you're unconditionally binding to both GTP0 and GTP1 UDP ports. This is > > done selectively based on netlink attributes in the existing "normal" > > non-OVS kernel code, i.e. the control is left to the user. > > > > Is this function is only called/used in the context of OVS? If so, > > since you explicitly implement only GTPv1, why bind to GTPv0 port? > > > > I had doubts on how this flow-based GTPv1 code path should fit in, which is > why > the GTPv0 and the GTPv1 code pieces are mixed in my changes. Well, I know nothing about flow-based paths and OVS, so if you are the one proposing related changes to me as the maintainer, you need to make sure that your changes are consistend and useful within your use case/scenario while making sure that the existing features don't break. If you refactor generic code (used by "classic" GTP tunneling + your new flow based tunneling), and that old code worked with GTPv0 and GTPv1, then your modifications must make sure that they continue to support GTPv0 and v1 in the "classic tunnel" use case. If your new code for flow-based tunneling simply only implements v0, it is fine to me - but then those restrictions must be in the flow-based part only, and things must be consistent. I.e. in this case, for the flow-based tunnel approach you must not bind the v0 port, if you don't handle related packets. > Should I explicitly claim that the flow-based change is for GTPv1 > only? That's definitely important, too - but is not the point I raised (see above). > > > + setup_udp_tunnel_sock(net, sock1u, _cfg); > > > > even here, you're only setting up the v1 and not v0. > > same reason as above. yes, but if I read your code correctly, this is generic/shared code that will break the existing GTPv0 support in the "classic tunnel" case! > > > + /* Assume largest header, ie. GTPv0. */ > > > + dev->needed_headroom= LL_MAX_HEADER + > > > + sizeof(struct iphdr) + > > > + sizeof(struct udphdr) + > > > + sizeof(struct gtp0_header); > > > > ... and here you're using headroom for a GTPv0 header, despite (I think) > > only supporting GTPv1 from this configuration? > > Yes, only GTPv1 is supported. well, then I suggest you don't generate headroom for a v0 header (which is larger) in a v1-only code path :) > > > + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free?? > > > > I think that question about when to free needs to be resolved before any > > merge. Did you check that it persists even after the device is > > closed/removed? > > I didn't investigate it. What do you mean by persist? "persist" means "remains allocated after the release of the network device". Whatever you allocate during device creation you must de-allocate on device release. I cannot tell you when exactly (as I'm not familiar with OVS or flow-based tunneling, as indicateD). However, I know for sure we cannot introduce code that looks like it introduces memory leaks to the kernel :) Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v1 2/3] gtp: Support creating flow-based gtp net_device
Hi Jiannan, please keep in mind I have zero clue about OVS, so I cannot really comment on what is customary in that context and what not. However, some general review from the GTP point of view below: On Wed, Jul 12, 2017 at 05:44:54PM -0700, Jiannan Ouyang wrote: > Add the gtp_create_flow_based_dev() interface to create flow-based gtp > net_device, which sets gtp->collect_md. Under flow-based mode, UDP sockets are > created and maintained in kernel. > > Signed-off-by: Jiannan Ouyang <ouya...@fb.com> > --- > drivers/net/gtp.c | 213 > +- > include/net/gtp.h | 8 ++ > 2 files changed, 217 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c > index 5a7b504..09712c9 100644 > --- a/drivers/net/gtp.c > +++ b/drivers/net/gtp.c > @@ -642,9 +642,94 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, > struct net_device *dev) > return NETDEV_TX_OK; > } > > +static int gtp_hashtable_new(struct gtp_dev *gtp, int hsize); > +static void gtp_hashtable_free(struct gtp_dev *gtp); > +static int gtp_encap_enable(struct gtp_dev *gtp, struct nlattr *data[]); > + > +static int gtp_change_mtu(struct net_device *dev, int new_mtu, bool strict) > +{ > + int max_mtu = IP_MAX_MTU - dev->hard_header_len - sizeof(struct iphdr) > + - sizeof(struct udphdr) - sizeof(struct gtp1_header); > + > + if (new_mtu < ETH_MIN_MTU) > + return -EINVAL; > + > + if (new_mtu > max_mtu) { > + if (strict) > + return -EINVAL; > + > + new_mtu = max_mtu; > + } > + > + dev->mtu = new_mtu; > + return 0; > +} > + > +static int gtp_dev_open(struct net_device *dev) > +{ > + struct gtp_dev *gtp = netdev_priv(dev); > + struct net *net = gtp->net; > + struct socket *sock1u; > + struct socket *sock0; > + struct udp_tunnel_sock_cfg tunnel_cfg; > + struct udp_port_cfg udp_conf; > + int err; > + > + memset(_conf, 0, sizeof(udp_conf)); > + > + udp_conf.family = AF_INET; > + udp_conf.local_ip.s_addr = htonl(INADDR_ANY); > + udp_conf.local_udp_port = htons(GTP1U_PORT); > + > + err = udp_sock_create(gtp->net, _conf, ); > + if (err < 0) > + return err; > + > + udp_conf.local_udp_port = htons(GTP0_PORT); > + err = udp_sock_create(gtp->net, _conf, ); > + if (err < 0) > + return err; you're unconditionally binding to both GTP0 and GTP1 UDP ports. This is done selectively based on netlink attributes in the existing "normal" non-OVS kernel code, i.e. the control is left to the user. Is this function is only called/used in the context of OVS? If so, since you explicitly implement only GTPv1, why bind to GTPv0 port? > + setup_udp_tunnel_sock(net, sock1u, _cfg); even here, you're only setting up the v1 and not v0. > + /* Assume largest header, ie. GTPv0. */ > + dev->needed_headroom= LL_MAX_HEADER + > + sizeof(struct iphdr) + > + sizeof(struct udphdr) + > + sizeof(struct gtp0_header); ... and here you're using headroom for a GTPv0 header, despite (I think) only supporting GTPv1 from this configuration? > + err = gtp_hashtable_new(gtp, GTP_PDP_HASHSIZE); // JO: when to free?? I think that question about when to free needs to be resolved before any merge. Did you check that it persists even after the device is closed/removed? -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v1 1/3] gtp: refactor to support flow-based gtp encap and decap
+static inline void gtp1_push_header(struct sk_buff *skb, __be32 tid) > { > int payload_len = skb->len; > struct gtp1_header *gtp1; > @@ -426,7 +452,7 @@ static inline void gtp1_push_header(struct sk_buff *skb, > struct pdp_ctx *pctx) > gtp1->flags = 0x30; /* v1, GTP-non-prime. */ > gtp1->type = GTP_TPDU; > gtp1->length= htons(payload_len); > - gtp1->tid = htonl(pctx->u.v1.o_tei); > + gtp1->tid = tid; > > /* TODO: Suppport for extension header, sequence number and N-PDU. >* Update the length field if any of them is available. > @@ -438,34 +464,17 @@ struct gtp_pktinfo { > struct iphdr*iph; > struct flowi4 fl4; > struct rtable *rt; > - struct pdp_ctx *pctx; > struct net_device *dev; > __be16 gtph_port; > }; > > -static void gtp_push_header(struct sk_buff *skb, struct gtp_pktinfo *pktinfo) > -{ > - switch (pktinfo->pctx->gtp_version) { > - case GTP_V0: > - pktinfo->gtph_port = htons(GTP0_PORT); > - gtp0_push_header(skb, pktinfo->pctx); > - break; > - case GTP_V1: > - pktinfo->gtph_port = htons(GTP1U_PORT); > - gtp1_push_header(skb, pktinfo->pctx); > - break; > - } > -} > - > static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo, > struct sock *sk, struct iphdr *iph, > - struct pdp_ctx *pctx, struct rtable *rt, > - struct flowi4 *fl4, > + struct rtable *rt, struct flowi4 *fl4, > struct net_device *dev) > { > [...] > + __be32 tun_id; you are breaking GTPv0 functionality here. GTPv0 has 64 bit tunnel identifiers, and this function is called both from GTPv1 and GTPv0 context. This makes me wonder how you did verify that your changes do not break the existing operation with both GTPv0 and GTPv1? > + // flow-based GTP1U encap > + info = skb_tunnel_info(skb); > + if (gtp->collect_md && info && ntohs(info->key.tp_dst) == GTP1U_PORT) { I think it's typically safe to assume that GTP is only operated on standard ports, but it is something you chould/should think about, i.e. whether you want that kind of restriction. In the existing use case, we have the v0/v1 information stored in the per-pdp context structure. > + tun_id = htonl(pctx->u.v1.o_tei); here is where you're assuming GTPv1 in two ways from code that is called from both v0 and v1. * you're dereferencing a v1 specific element in the pctx union * you're storing the result in a 32bit variable > gtp = netdev_priv(dev); > + gtp->net = src_net; Isn't this a generic change that's independent of your work on OVS GTP? Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v1 0/3] Flow Based GTP Tunneling
hi Jiannan, net-next si closed, as it has been pointed out already by Joe. On Wed, Jul 12, 2017 at 05:44:52PM -0700, Jiannan Ouyang wrote: > ovs-ofctl add-flow br0 > "in_port=2,tun_src=192.168.60.141,tun_id=123, \ > actions=set_field:02:00:00:00:00:00->eth_src, \ > set_field:ff:ff:ff:ff:ff:ff->eth_dst,LOCAL" I'm not familiar with the details here, but does this imply that you are matching on the outer (transport layer) source IP address? If so, please note this is in violation of the 3GPP specifications for GTP, which require explicitly that the TEID is the *only* criteria for matching an encapsulated packet to the tunnel. Basically anyone can send you an encapsulated packet from any random source IP, just as long as the TEID matches a tunnel, it will be decapsulated. This is [presumably] in order to take care of mobility, as the subscribers' phone moves around different MME/S-GW/SGSN, each having different source IP addresses. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: loosing netdevices with namespaces and unshare?
Hi Eric, On Thu, Jun 01, 2017 at 01:32:49AM -0500, Eric W. Biederman wrote: > If a network device does not implement rntl_link_ops it is returned to > the initial network namespace. Anything else will loose physical > devices. Thanks a lot for your statement. This is a big relief, my line of thinking thus is confirmed: We shall not loose physical devices. > Only for pure software based devices do we delete them. Perhaps your > sub interface implements rtnl_link_ops? Either that or something is > still holding a reference to your network namespace, which would prevent > the network device from being returned. My question is how to debug this further? Monitoring /proc/*/ns/net* showed that the ID of the namespace is gone after terminating my processes in the namespace. Short of adding printk() or playing with kprobes: to the related kernel code, how can I track the reference count or get an idea who might hold references? Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: loosing netdevices with namespaces and unshare?
Hi Cong, On Wed, May 31, 2017 at 03:40:33PM -0700, Cong Wang wrote: > But you have other choices than using the physical interface > directly in non-root ns, for example, creating a virtual pair and > connect it with the physical one with a bridge. There are various > ways to achieve this. Yes, but then those "workarounds" expose the given physical device to the root namespace, which is exactly what I try to avoid here. The interface has no purpose outside of the specific target namespace, and under no circumstances should the various applications on a normal Linux system (whether it's network manager or whatever else) start to use the device. The same also applies to the kernel itself. It is not desirable to have the "root netns" start to do things like ipv6 stateless autoconfiguration, etc. I of course know that all of those things can be individually disabled. I just think having a physical netdev inside "single application" namespaces is more complicated than it could be. However, I have sufficiently made my argument clear, and I understand that you don't share my concern. This is perfectly fine. We agree to disagree :) I simply have to find the least intrusive work-around to my liking for the intentional but so far undocumented behavior of netdevices vanishing into thin air. I'll manage. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: loosing netdevices with namespaces and unshare?
Hi Cong, On Wed, May 31, 2017 at 10:44:53AM -0700, Cong Wang wrote: > >> Net namespace simply unregisters all netdevices inside when it is > >> gone, no matter where they are from. > > > > ah, ok. I missed that part. Is there a good piece of documentation on > > netwokr namespaces that I should read? > > I don't know any doc mentioning this. That's of course a pity. I'll see what can be done about amending the netns related manpage or the like. > >> > Of course I know I could simply do something like "ip link set eth0 > >> > netns 1" from within the namespace before leaving. But what if the > >> > process is not bash and the process exits abnormally? I'd consider > >> > that explicit reassignment more like a hack than a proper solution... > >> > >> It doesn't make sense to move it back to where it is from, for example, > >> what if you move a veth0 from netns1 to netns2 and netns1 is gone > >> before netns2? > > > > for virtual devices, I would agree. For physical devices, I think the > > default behavior to unregister them is - from my of course very > > subjective point of view - quite questionable. > > Network namespace does not special-case the physical devices, > it treats them all equally as abstract net devices. I hear you, and I understand that of course from a developer point of view it makes sense to treat all devices the same. I just wonder if from an usability point of view this is the best choice. Virtual devices can be (re)created at any time, physical not. I mean, what is the *use case* for loosing any refrence to a physical network device and unregistering it from the stack? Is there any API by which a new netdevice structure can be instantiated on the actual hardware? Registering the netdev is what the driver does during discovering the system hardware. If there's a method to "automagically" loose devices, at the very least I wold expect some reasonable method to resurrect them. Unloading the kernel module and reloading it is for sure not elegant, particularly not if you have multiple Ethernet devices/ports sharing the same driver. One could e.g. also think of something like a special namespace that collects all the "orphan" netdevices. Something analogous to the old Unix tradition of "pid 1" collecting all the orphan tasks whose parents died. Transferring them into that "netdev orphanage" could automatically set the link down so that no accidential routing/forwarding of traffic between the devices is possible. This is just my two cents. Given my past involvement in Linux networking I allow myself having an opinion on such matters. But if the kernel networking community thinks it is ok to loose all references to a physical network device due to processes terminating irregularly (which will happen, as indicated in OOM or software bug cases), then I will of course have to accept that. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: loosing netdevices with namespaces and unshare?
Hi Cong, On Tue, May 30, 2017 at 04:18:17PM -0700, Cong Wang wrote: > On Tue, May 30, 2017 at 3:07 PM, Harald Welte <lafo...@gnumonks.org> wrote: > > But, to the contrary, this doesn't happen. The unshare-created netns is > > gone, but the netdevice did not get moved back to the root namespace > > either. The only hack to get back to the "eth0" device is to unload the > > driver and re-load it. > > > Net namespace simply unregisters all netdevices inside when it is > gone, no matter where they are from. ah, ok. I missed that part. Is there a good piece of documentation on netwokr namespaces that I should read? > I am pretty sure you can move it back to root-ns if you want, Yes, I can explicitly do that, but this of course doesn't work if e.g. my [single] process in that namespace crashes due to some bug, OOM or the like. > it is a little tricky because you have to give the root-ns a name > first. It's actually not, as you can just identify the root-ns by pid 1, so "ip link set $DEV netns 1" will move it back. As indicated, I'm worried about the error paths. > > What am I missing here? Is this the intended behavior? > > Yes it is. thanks for your confirmation. Guess I have to get used to it. > > Of course I know I could simply do something like "ip link set eth0 > > netns 1" from within the namespace before leaving. But what if the > > process is not bash and the process exits abnormally? I'd consider > > that explicit reassignment more like a hack than a proper solution... > > It doesn't make sense to move it back to where it is from, for example, > what if you move a veth0 from netns1 to netns2 and netns1 is gone > before netns2? for virtual devices, I would agree. For physical devices, I think the default behavior to unregister them is - from my of course very subjective point of view - quite questionable. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
loosing netdevices with namespaces and unshare?
Hi all, I know I'm kind of late to the party in terms of deeper exploration of Linux network namespaces. Also, I'm not sure if the netdev list is the riight place to ask, but a moderate amount of web searching didn't bring up a solution in multiple hours, and it seems like I could trigger the kernel (4.11.0) to loose netdevices, which I think is a serious issue. What I'm doing: * start a process using the 'unshare' command line tool provided with util-linux, e.g. "unshare -nUr bash". I do this as a non-privileged user but now that is mapped to uid '0' inside the new process/namespace, so I can adjust interface configuration. * I use "echo $$" to get the PID of that bash process. * On another terminal in a root shell, I use "ip link set eth0 netns $PID" in order to move a given physical device into that namespace. * I then "exit" that bash, which should - to my knowledge - return the "eth0" netdev back to the root namespace, as the bash process was the only one using that network namespace But, to the contrary, this doesn't happen. The unshare-created netns is gone, but the netdevice did not get moved back to the root namespace either. The only hack to get back to the "eth0" device is to unload the driver and re-load it. I can reproduce the above without starting any other process inside that namespace. I have verified that there are no /proc/*/ns/net symlinks left pointing to the ID of that namespace. What am I missing here? Is this the intended behavior? Of course I know I could simply do something like "ip link set eth0 netns 1" from within the namespace before leaving. But what if the process is not bash and the process exits abnormally? I'd consider that explicit reassignment more like a hack than a proper solution... Regards, Harald p.s.: In case you're wondering what I'm actually trying to achieve: Find an easy way to run a single program in an isolated namespace that only has one physical (usb) ethernet device. I would like to execute that program as unprivileged user but still be able to bind to privileged ports. And I want to do this using simple command-line tools without all the bloat and overhead of "container" solutions that have 99% of features I don't need. But let that not distract you, I think the mysteriously disappearing netdevices are a more general and important issue. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v3 0/2] GTP SGSN-side tunnels
Hi Jonas, looks fine to me, but I haven't tested it. Did you manually test it using the extended libgtpnl + tools? Also, in code like this: + if (gtp->role == GTP_ROLE_SGSN) { + pctx = ipv4_pdp_find(gtp, iph->saddr); + } else { I think general Linux kernel coding style is to not have curly-brackets around single-line blocks. See "Do not unnecessarily use braces where a single statement will do." in line 169 of Documentation/process/coding-style.rst I won't mind your current style, and it is not a blocker issue to me, but still it would be nice for general consistency. Acked-by: Harald Welte <lafo...@gnumonks.org> -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
On Wed, Mar 15, 2017 at 08:10:38PM +0100, Harald Welte wrote: > I've modified the patch slightly, see below (compile-tested, but not > otherwise tested yet). Basically rename the flags attribute to 'role', > expand the commit log and removed unrelated cosmetic changes. I also have a version against current net-next/master, in case anyone is interested.. >From 3274a3303d1ec997392a07a92666d57b13997658 Mon Sep 17 00:00:00 2001 From: Jonas Bonn <jo...@southpole.se> Date: Wed, 15 Mar 2017 20:24:28 +0100 Subject: [PATCH] gtp: support SGSN-side tunnels The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP contexts based on the incoming packets _destination_ address. For real-world use cases, this is sufficient, as the other side of a GTP tunnel is not in fact implemented by GTP, but by the protocol stacking of a mobile station / user equipment on the radio interface (like PDCP, SNDCP). However, if we want to simulate the mobile station, radio access network and SGSN (for example to test the GGSN side implementation), then we want to be identifying PDP contexts based on _source_ address. This patch adds a "role" attribute at GTP-link creation time to specify whether we behave like the GGSN or SGSN role of the tunnel; this attribute is then used to determine which part of the IP packet to use in determining the PDP context. Signed-off-by: Jonas Bonn <jo...@southpole.se> Signed-off-by: Harald Welte <lafo...@gnumonks.org> --- drivers/net/gtp.c| 46 +--- include/uapi/linux/gtp.h | 2 +- include/uapi/linux/if_link.h | 7 +++ 3 files changed, 43 insertions(+), 12 deletions(-) diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 3e1854f34420..3ab593b9be85 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -74,6 +74,7 @@ struct gtp_dev { struct net_device *dev; + enum ifla_gtp_role role; unsigned inthash_size; struct hlist_head *tid_hash; struct hlist_head *addr_hash; @@ -154,8 +155,8 @@ static struct pdp_ctx *ipv4_pdp_find(struct gtp_dev *gtp, __be32 ms_addr) return NULL; } -static bool gtp_check_src_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, - unsigned int hdrlen) +static bool gtp_check_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, + unsigned int hdrlen, enum ifla_gtp_role role) { struct iphdr *iph; @@ -164,27 +165,31 @@ static bool gtp_check_src_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, iph = (struct iphdr *)(skb->data + hdrlen); - return iph->saddr == pctx->ms_addr_ip4.s_addr; + if (role == GTP_ROLE_SGSN) + return iph->daddr == pctx->ms_addr_ip4.s_addr; + else + return iph->saddr == pctx->ms_addr_ip4.s_addr; } /* Check if the inner IP source address in this packet is assigned to any * existing mobile subscriber. */ -static bool gtp_check_src_ms(struct sk_buff *skb, struct pdp_ctx *pctx, -unsigned int hdrlen) +static bool gtp_check_ms(struct sk_buff *skb, struct pdp_ctx *pctx, +unsigned int hdrlen, enum ifla_gtp_role role) { switch (ntohs(skb->protocol)) { case ETH_P_IP: - return gtp_check_src_ms_ipv4(skb, pctx, hdrlen); + return gtp_check_ms_ipv4(skb, pctx, hdrlen, role); } return false; } -static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, unsigned int hdrlen) +static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb, unsigned int hdrlen, + enum ifla_gtp_role role) { struct pcpu_sw_netstats *stats; - if (!gtp_check_src_ms(skb, pctx, hdrlen)) { + if (!gtp_check_ms(skb, pctx, hdrlen, role)) { netdev_dbg(pctx->dev, "No PDP ctx for this MS\n"); return 1; } @@ -239,7 +244,7 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) return 1; } - return gtp_rx(pctx, skb, hdrlen); + return gtp_rx(pctx, skb, hdrlen, gtp->role); } static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) @@ -281,7 +286,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb) return 1; } - return gtp_rx(pctx, skb, hdrlen); + return gtp_rx(pctx, skb, hdrlen, gtp->role); } static void gtp_encap_destroy(struct sock *sk) @@ -481,7 +486,11 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev, * Prepend PDP header with TEI/TID from PDP ctx. */ iph = ip_hdr(skb); - pctx = ipv4_pdp_find(gtp, iph->daddr); + if (gtp->role == GTP_ROLE_SGSN) + pctx = ipv4_pdp_find(gtp, iph->sad
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
Hi Pablo, On Wed, Mar 15, 2017 at 06:23:48PM +0100, Pablo Neira Ayuso wrote: > On Wed, Mar 15, 2017 at 05:39:16PM +0100, Harald Welte wrote: > > > > I would definitely like to see this move forward, particularly in order > > to test the GGSN-side code. > > Agreed. I've modified the patch slightly, see below (compile-tested, but not otherwise tested yet). Basically rename the flags attribute to 'role', expand the commit log and removed unrelated cosmetic changes. I've also prepared a corresponding change to libgtpnl into the laforge/sgsn-rol branch, see http://git.osmocom.org/libgtpnl/commit/?h=laforge/sgsn-role This is not yet tested in any way, but I'm planning to add some associated support to the command line tools and then give it some testing (both against the kernel GTP in GGSN mode, as well as an independent userspace GTP implementation). > It would be good if we provide a way to configure GTP via iproute2 for > testing purposes. I don't really care about which tool is used, as long as it is easily available [and FOSS, of course]. > We would need to create some dummy socket from > kernel too though so we don't need any userspace daemon for this > testing mode. I don't really like that latter idea. It sounds too much like a hack to me. But then, I don't have enough phantasy right now ti imagine how an actual implementation would look like. To me, it is perfectly fine to run a simple, small utility in userspace even for testing. Regards, Harald >From 63920950f9498069993def78e178bde85c174e0c Mon Sep 17 00:00:00 2001 From: Jonas Bonn <jo...@southpole.se> Date: Wed, 15 Mar 2017 17:52:28 +0100 Subject: [PATCH] gtp: support SGSN-side tunnels The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP contexts based on the incoming packets _destination_ address. For real-world use cases, this is sufficient, as the other side of a GTP tunnel is not in fact implemented by GTP, but by the protocol stacking of a mobile station / user equipment on the radio interface (like PDCP, SNDCP). However, if we want to simulate the mobile station, radio access network and SGSN (for example to test the GGSN side implementation), then we want to be identifying PDP contexts based on _source_ address. This patch adds a "role" attribute at GTP-link creation time to specify whether we behave like the GGSN or SGSN role of the tunnel; this attribute is then used to determine which part of the IP packet to use in determining the PDP context. Signed-off-by: Jonas Bonn <jo...@southpole.se> Signed-off-by: Harald Welte <lafo...@gnumonks.org> diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 99d3df788ce8..9aef4217f6e1 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -71,6 +71,7 @@ struct gtp_dev { struct net_device *dev; + enum ifla_gtp_role role; unsigned inthash_size; struct hlist_head *tid_hash; struct hlist_head *addr_hash; @@ -149,8 +150,8 @@ static struct pdp_ctx *ipv4_pdp_find(struct gtp_dev *gtp, __be32 ms_addr) return NULL; } -static bool gtp_check_src_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, - unsigned int hdrlen) +static bool gtp_check_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, + unsigned int hdrlen, enum ifla_gtp_role role) { struct iphdr *iph; @@ -159,18 +160,21 @@ static bool gtp_check_src_ms_ipv4(struct sk_buff *skb, struct pdp_ctx *pctx, iph = (struct iphdr *)(skb->data + hdrlen); - return iph->saddr == pctx->ms_addr_ip4.s_addr; + if (role == GTP_ROLE_SGSN) + return iph->daddr == pctx->ms_addr_ip4.s_addr; + else + return iph->saddr == pctx->ms_addr_ip4.s_addr; } /* Check if the inner IP source address in this packet is assigned to any * existing mobile subscriber. */ -static bool gtp_check_src_ms(struct sk_buff *skb, struct pdp_ctx *pctx, -unsigned int hdrlen) +static bool gtp_check_ms(struct sk_buff *skb, struct pdp_ctx *pctx, +unsigned int hdrlen, enum ifla_gtp_role role) { switch (ntohs(skb->protocol)) { case ETH_P_IP: - return gtp_check_src_ms_ipv4(skb, pctx, hdrlen); + return gtp_check_ms_ipv4(skb, pctx, hdrlen, role); } return false; } @@ -204,7 +208,7 @@ static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb, goto out_rcu; } - if (!gtp_check_src_ms(skb, pctx, hdrlen)) { + if (!gtp_check_ms(skb, pctx, hdrlen, gtp->role)) { netdev_dbg(gtp->dev, "No PDP ctx for this MS\n"); ret = -1; goto out_rcu; @@ -261,7 +265,7 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
Hi Jonas, are you working on the review feedback that was provided back in early February? I think there were some comments like * remove unrelated cosmetic change in comment * change from FLAGS to a dedicated MODE netlink attribute * add libgtpnl code and some usage information or even sample scripts I would definitely like to see this move forward, particularly in order to test the GGSN-side code. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH] netfilter: logging copyrights is useless
Hi Corentin, On Wed, Mar 15, 2017 at 02:17:39PM +0100, Corentin Labbe wrote: > Logging copyrights does not add any useful information in logs. > This patch remove such logging Historically, there were plenty of more copyright notices for certain drivers or sections of the code being printed while booting. I still remember fondly the many ethernet driver notices of a Donald Becker, for example. I understand that it is questionable whether or not such statements are "useful". You might argue, their use is in * stating a legal formality to the user * making it simpler to determine if a given part of code is used in a given device (e.g. as part of GPL enforcement) while just logging the serial console and no requirement to (find a way to) dump the internal flash Besides such practical arguments (it is of what use to whom), there are legal concerns regarding the removal of copyright statements. This holds true on whether or not it is Free Software, or whether or not it is GPL licensed. If an author puts a copyright statement somehwere, he exercises his right to be regarded as the author of the work. It is typically not permitted to remove such notices, as that would be a copyright infringement in itself. Also, beyond general legal concerns, the GPLv2 states explicitly: > c) If the modified program normally reads commands interactively > when run, you must cause it, when started running for such > interactive use in the most ordinary way, to print or display an > announcement including an appropriate copyright notice and a notice > that there is no warranty (or else, saying that you provide a > warranty) and that users may redistribute the program under these > conditions, and telling the user how to view a copy of this License. > (Exception: if the Program itself is interactive but does not normally > print such an announcement, your work based on the Program is not > required to print an announcement.) Now you can argue whether the kernel is a an interactive program, but at least you can see some intent to not remove any notices/messages that were originally present in the program. So I think your patch could only applied if the respective copyright holders agree to remove their respective notices. I personally would argue to keep them. Nobody has complained about them so far, and they have probably saved many weeks of my work time in GPL compliance / enforcement work. I understand this is a "niche use case", though ;) -- - Harald Welte <lafo...@netfilter.org> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie
Re: [PATCH net-next 0/4] gtp: support multiple APN's per GTP endpoint
Hi Andreas, On Tue, Mar 14, 2017 at 02:42:16PM +0100, Pablo Neira Ayuso wrote: > On Tue, Mar 14, 2017 at 01:42:44PM +0100, Andreas Schultz wrote: > > The only supported and documented API for the GTP module is libgtpnl. > > No, the netlink interface itself if the API. > > Stopping trying to find a reason to break API, that is a no-go. As much as one might dislike it as a developer in this particular case, the Linux kernel has the very well communicated rule: All userspace visible interfaces must not change in an incompatible way. This includes of course all the syscalls, the ioctl() parameters but also the netlink interfaces of the networking stack. The statement "nobody ever used it" is a statement you can never make in FOSS software, as you don't know of 99.999% of all the users of your software. The fact that none of the FOSS projects that any of us was involved in may not have used a certain feature doesn't mean nobody else has been using it privately, quietly. Keep in mind that several Linux distributions have already been shipping the gtp module as part of their stable releases meanwhile. Also, no matter what Pablo or I may think about, there are general rules about how Linux kernel development is done (from coding style to merge windows, and also userspace compatibility), and we all have to obey them. There's little point in discussing about them, we all just have to live with them. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v5 0/7] gtp: misc improvements
Hi Pablo, On Mon, Mar 13, 2017 at 05:48:12PM +0100, Pablo Neira Ayuso wrote: > I would like you take the time to develop the VRF idea that you want > to introduce, I just would like that we avoid bloating the GTP tunnel > with features that we can just achieve via composite of different > subsystems. I presume you're talking about the "several APN behind one GGSN IP" which requires the de-coupling of the UDP socket from the gtp net-device? I actually think it is a very straight forward idea what Andreas is proposing, and I see no bloat associated. It's rather like splitting existing combined functionality in two parts, which can still be used together, but also be used separately. Or are you referring to something else? In any case, I'm looking forward to the related technical discussion on this mailing list[s] :) Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v5 0/7] gtp: misc improvements
Hi Andreas, > Andreas Schultz (7): > gtp: switch from struct socket to struct sock for the GTP sockets > gtp: make GTP sockets in gtp_newlink optional > gtp: merge gtp_get_net and gtp_genl_find_dev > gtp: consolidate gtp socket rx path > gtp: unify genl_find_pdp and prepare for per socket lookup > gtp: consolidate pdp context destruction into helper > gtp: add socket to pdp context I agree with the conceptual and architectural direction that you're taking the code, and I also think your current patchset is good to go ahead, so feel free to add my "Acked-By: Harald Welte <lafo...@gnumonks.org>". However, the usual disclaimer: I've been out of kernel networking land for a long time, so my review skills are clearly not the best anymore. I'm happy for any input others can give from that point of view. There are some minor typos in the commit logs, but at the rate my typos increase over the years, I am the last one to complain about wasting time on fixing these ;) Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Basic test setup for testing of Kernel GTP-U
Hi all, [intentionally breaking the thread here] On Thu, Feb 23, 2017 at 05:46:57PM +0100, Harald Welte wrote: > I'll try to cook up some instructions extending > https://osmocom.org/projects/openggsn/wiki/OpenGGSN to cover also > sgsnemu for a basic use case of establishing one single tunnel. That's > of course like a manual "HOWTO" and not yet anything that can be tested > automatically. I've documented the instructions at https://osmocom.org/projects/linux-kernel-gtp-u/wiki/Basic_Testing Please let me know any updates/corrections/questions if you try to reproduce this. The above instructions were working for me yesterday. Please find an ASCII export of this below (much less readable than the wiki). Regards, Harald h1. Basic Testing This page documents some basic testing setup for the Kenrel GTP-U code. It follows the below rationale: * focus on testing the kernel GTP-U module without too much external dependencies * test GTP-U interoperability of the kernel with at least one other implementation, not just kernel-to-kernel (which currently is not supported in the kernel, as it only implements the GGSN/P-GW role) * limit testing to SGSN/S-GW and GGSN/P-GW, without a real cellular network (which is possible e.g. using [[OsmoSGSN:]] and [[OsmoPCU:]]) h2. Building / Installing dependencies In order to follow below test instructions, you will need * A Linux kernel including the GTP-U driver (@drivers/net/gtp.c@) either compiled-in or as kernel module * [[libgtpnl]] - the userspace library providing an API around the kernel GTP-U netlink interface * [[OpenGGSN:]] - a minimal C-language implementation of a 3GPP GGSN, also contains a SGSN-side emulator called [[OpenGGSN:sgsnemu]] ** make sure you use Version 0.93 or later ** make sure you compile it with @--enable-gtp-linux@ enable during the @./configure@ step You can find some instructions on how to build [[OpenGGSN:]] with support for [[libgtpnl]] and kernel GTP-U at this wiki page: [[OpenGGSN:Kernel_GTP]] h2. Test setup description We will run the GGSN natively on the host, and put the emulated SGSN inside a separate network namespace. The two namespaces are interconnected by a virtual ethernet device using the transfer network 172.31.1.0/24 The GGSN is configured to provide a pool of IP addresses from the 192.168.71.0/24 range. Each PDP context will be allocated one dynamic address from that pool h2. Test instructions h3. create the network namespace for the SGSN ip netns add sgsn h3. add veth to be used between SGSN and GGSN ip link add veth0 type veth peer name veth1 h3. remote (SGSN) side of veth device ip link set veth1 netns sgsn ip netns exec sgsn ip addr add 172.31.1.2/24 dev veth1 ip netns exec sgsn ip link set veth1 up h3. local (GGSN) side of veth device ip addr add 172.31.1.1/24 dev veth0 ip link set veth0 up h3. execute the GGSN on the host ggsn -g -c ./ggsn.conf.test (use the file attached to this wiki page) The @-g@ option is responsible for activating kernel-GTP support. If you cannot get the example described in this document to work, try the pure GTP-U userspace implementation by removing @-g@ from the above command line. If it works then, there's something related to Kernel GTP-U that breaks the setup. h3. execute the emulated SGSN inside the sgsn namespace ip netns exec sgsn sgsnemu -d -r 172.31.1.1 -l 172.31.1.2 --defaultroute --createif h3. verify the existnace of the GTP tunnel ggsn:~# gtp-tunnel list version 1 tei 1/1 ms_addr 192.168.71.2 sgsn_addr 172.31.1.2 h3. further testing in the @sgsn@ namespace, there's now a default-route that points into the GTP tunnel. You can use this to ping any network address that's reachable to the GGSN host. If that host is connected to the internet, you can e.g. run a ping command from within the namespace using ip netns exec sgsn ping -c 10 8.8.8.8 which will send some IP packets to 8.8.8.8 via the tun0 device (created by [[OpenGGSN:sgsnemu]]). It will be encapsulated by the userspace GTP-U implementation of sgsnemu, sent via the veth device to the host, where it ends up inthe GTP-U kernel module, decapsulating the package and passing in on to the gtp0 device there. Anything beyond that point depends on your local routing configuration. == Building OpenGGSN == h1. OpenGGSN and Linux Kernel accelerated GTP-U OpenGGSN has support to use the Linux kernel GTP-U code to accelerate the data/user plane while still implementing the control plane (GTP-C) in userspace in OpenGGSN. For more information about the Linux kernel GTP-U code, please see [[linux-kernel-gtp-u:]] h2. Building OpenGGSN with kernel which has GTP-U support At the time of writing (2017-01-09) of this wiki, below listed distributions have support of GTP kernels : * Debian 4.9.1-1~exp1 - Debian's latest experimental build * Ubuntu 16.10 kernel 4.8 * OpenSUSE Tumbleweed (stable) 4.8 (.12) * Fedora 25 kernel 4.8 Fol
Re: [PATCH net-next v4 4/7] gtp: consolidate gtp socket rx path
Hi Tom, On Thu, Feb 23, 2017 at 10:07:03AM -0800, Tom Herbert wrote: > I'm looking at the GTP encapsulation, it's not particularly complex. > Is there any real reason why we can't just implement a netdev > interface with static tunnels and configuration to do performance > testing and development? For instance, if we want to add GSO/GRO > support to GTP that's all we really need, the control plane should be > inconsequential in that case. As outlined several times in this thread, GTP tunneling is not symmetric. The current mainline code can only act as one of the two roles (GGSN/G-GW), not as the other one. The rationale is that in 3GPP networks, that is the only point in the network that takes native IP data (e.g.from the internet) and puts it into GTP. At all other places in the network, you don't have this combination. By the time the inner IP data arrives at the mobile phone, it is no longer encapsulated in GTP, as the lower layers have been adapted one or multiple times to other protocols by other network elements. There's a patch that has recently been submitted which adds the capability for the SGSN/S-GW side of a GTP-U tunnel, but that patch has so far not been merged (due to concersn about its netlink interface), and the patch *only* exists for testing, it has no real-world application in 3GPP networks. So as of now, it is not possible to run both endpoints of a tunnel in Linux. > > For performance testing, you would need a SGSN-side implementation of > > I'll try to cook up some instructions extending > > https://osmocom.org/projects/openggsn/wiki/OpenGGSN to cover also > > sgsnemu for a basic use case of establishing one single tunnel. That's > > of course like a manual "HOWTO" and not yet anything that can be tested > > automatically. > > > That would be good. Thanks! I've spent some hours earlier today on this, I expect the document to be ready at some point over the weekend. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next] net/gtp: Add udp source port generation according to flow hash
Hi Tom, On Thu, Feb 23, 2017 at 08:35:13AM -0800, Tom Herbert wrote: > On Thu, Feb 23, 2017 at 6:00 AM, Pablo Neira Ayuso <pa...@netfilter.org> > wrote: > > On Thu, Feb 23, 2017 at 10:35:56AM +0100, Andreas Schultz wrote: > >> - On Feb 22, 2017, at 10:47 PM, Tom Herbert t...@herbertland.com wrote: > >> So in a complete 3GPP node (GGSN, P-GW) that uses the GTP tunnel > >> implementation, malicious traffic should be blocked before it can reach > >> the tunnel. > >> > >> And as I stated before, the GTP tunnel module is not supposed to be > >> use without any of those components. So the DDOS concern should not > >> be handled at the tunnel level. > > > > I think that Tom's point is that this tunnel driver will have to deal > > with DDOS scenarios anyway, because reality is that you can't always > > block it before it can reach the tunnel. > > > Right, if only we had a dime for every time someone thought their > perimeter security was sufficient only to be proven wrong! Yes and no. Welcome to the cellular world. Everything is designed in a way that it assumes everyone can be trusted and that none of those interfaces (except the radio interface) are ever exposed to the public. There is really very little one can do to change that world. It's like the Internet in the early 1990ies, and they are reluctant to learn. And whenever a new system is designed (like the step from MAP to DIAMETER), they make damn sure that all the security issues are inherited from the previous standards to ensure interoperability ;) I understand and support the motivation to design robust systsems even in the presence of broken/ignorant specs, but I think this is one of the situations where it is useless (and IMHO impossible) to do anything about it. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v4 4/7] gtp: consolidate gtp socket rx path
Hi Tom, On Thu, Feb 23, 2017 at 08:28:47AM -0800, Tom Herbert wrote: > If there is a way for us to test this without setting up a full mobile > network please provide details on how to do that in the commit log. There are different ways how to do this. With the existing in-kernel code (that lacks the "SGSN role" patch which would be for testing only), you can e.g. use two relatively small C-language programs from the openggnsn package [http://git.osmocom.org/openggsn/]: * OpenGGSN with support for libgtpnl and the kernel GTP-U module * sgsnemu (included in openggsn source tree) which implements a minimal SGSN-side of the tunnel. It will * perform the GTP-C signaling required with OpenGGSN (which in turn will then instruct the kernel to open a tunnel via the netlink interface). * start a tun/tap device for the "mobile station end" of the tunnel perform en/decapsulation of data between that tun/tap device and GTP in userspace The wiki at https://osmocom.org/projects/openggsn/wiki/OpenGGSN contains step-by-step instructions how to build and configure OpenGGSN with support for kernel GTP-U. It's nothing more than autotools based compile+install of libgtpnl followed by "./configure --enable-gtp-linux" and "make" for OpenGGSN What is not described on the above page is how to use sgsnemu to simulate the SGSN side, as within Osmocom we typically run the open source OsmoSGSN (a more "real" SGSN). So using two small binaries that can be compiled without much external dependencies (and very few lines of configuration), it is possible to do some functional testing of the kernel GTP module. For performance testing this of course won't work, as sgsnemu is running entirely in userspace. For performance testing, you would need a SGSN-side implementation of GTP-U that performs equally well or better than the GGSN-side implementation in the kernel. One option is the patch that has recently been submitted to netdev and which is under discussion. However, then you simply test one implementation against itself, which provides no interoperability guarantees with other implementations, and thus also limited in different regards. > > Adding static tunnel support into the kernel module in any form > > makes IMHO no sense. GTP as defined by 3GPP always need a control > > instance and there are much better options for static tunnel > > encapsulations. > > > That misses the point. From the kernel point of view GTP is just > another encapsulation protocol and we now have a lot of experience > with those. The problem is when you post patches to improve it or fix > issues, if there is no practical way to perform independent analysis > or investigation. This makes GTP no different than a proprietary > protocol to us and that severely limits the value of trying to work > with the community. I wholeheartedly agree. That's one of the reasons why I recently posted a RFC about what to do for (regression and other) testing of the kernel GTP-U module. I'll try to cook up some instructions extending https://osmocom.org/projects/openggsn/wiki/OpenGGSN to cover also sgsnemu for a basic use case of establishing one single tunnel. That's of course like a manual "HOWTO" and not yet anything that can be tested automatically. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v3 1/8] gtp: add documentation
ever I believe there > is nothing that would stop a SGSN/S-GW from switching GPT-U tunnel source > transparently across IP's (for example a system with multiple shelves might > use different shelve in DL/UL direction and have therefore multiple IP's. yes, it might. But then, on the other hand - at least for what I know about 2G/3G - it is typically very cumbersome to announce IP addresses to roaming partners. It involves manual updating of related GSMA [paper?] forms, exchanging them with all roaming partners, and coordinating a change with them. AFAIK, it is not possible to announce 'this is my range of S-GW IP addreses', but you have to individually manually announce each of them in the related documents. There are projects out there where people are building additional layers of address translation just to work around those long and error-prone manual procedures, making sure that the externally visible IP addresses of a given SGSN/GGSN are always the same ones. So yes, all of the above are practical/procedural concerns, but they would be explanations why this scenario - even if it might be supported by the specs - is unlikely in practise, at least within current/past procedures and practises. Finally, from the point of view of packet filtering (which operators typically implement, otherwise the manual announcement of SGSN IP addresses by the above procedures wouldn't be required), it would be impossible to do state based filtering if there were use cases where an random IP address colud send valid traffic for a given tunnel. > > I find it somewhat surprising, given how much this opens the door for > > arbitrary spoofing from anyone (with access to the respective private > > network such as GRX). > > Yes it is, but it is mandated without a doubt by the specification for > GTP-C. For the reasons outline before, I think that this applies to > GTP-U as well. For GTP-U it may be the case. I'm not entirely convinced, though. Also, even if the specifications wanted to support such scenarios, I think it is doubtful that this is actually implemented in practise. If it was my choice, and I had to support the "loose matching", I would make it a configuration option (sysctl? netlink attribute?) and default to the more strict matching, including the source address. It just seems to make much more sense and be more safe/sane. For those people who really need the loose matching (and who are willing to pay the price for it in terms of opening up to spoofing and making packet filtering harder), they can set the option. But then, I'm not the one doing that implementation, so it is up to you. My opinion is just an opinion. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v3 1/8] gtp: add documentation
Hi Andreas, this is not really about the documentation, but it still makes sense to discuss here as the issue is best described: On Mon, Feb 13, 2017 at 04:36:17PM +0100, Andreas Schultz wrote: > +Local GTP-U entity and tunnel identification > + > + > +GTP-U uses UDP for transporting PDU's. The receiving UDP port is 2152 for > +GTPv1-U and 3386 for GTPv0-U. > + > +There is only one GTP-U entity (and therefor SGSN/GGSN/S-GW/PDN-GW instance) > +per IP address. Tunnel Endpoint Identifier (TEID) are unique per GTP-U > entity. > + > +A specific tunnel is only defined by the destination entity. Since the > +destination port is constant, only the destination IP and TEID define > +a tunnel. The source IP and Port have no meaning for the tunnel. Are you absolutely sure about this? > +[3GPP TS 29.281] Section 4.3.0 defines this so: > + > +> The TEID in the GTP-U header is used to de-multiplex traffic incoming from > +> remote tunnel endpoints so that it is delivered to the User plane entities > +> in a way that allows multiplexing of different users, different packet > +> protocols and different QoS levels. Therefore no two remote GTP-U endpoints > +> shall send traffic to a GTP-U protocol entity using the same TEID value > except > +> for data forwarding as part of mobility procedures. > + > +The definition above only defines that two remote GTP-U endpoints *should > not* > +send to the same TEID, it *does not* forbid or exclude such a scenario. In > +fact, the mentioned mobility procedures make it necessary that the GTP-U > entity > +accepts traffic for TEID's from multiple or unknown peers. I so far always assumed that you use GTP-C "MODIFY PDP CONTEXT" in case of such mobility situations, i.e. the control plane explicitly notifies the GTP-U entity of a change in the SGSN/S-GW address. At least for 3G this seems to be the case, and my assumption appears to be confirmed with sources such as e.g. page 15 of http://www.nwadmin.de/presentation_mobility_manag ement_in_UMTS.pdf Also, for LTE, it seems that there's a GTPv2-C "Modify Bearer Request/Response" involved in relocation from one S-GW to another S-GW: http://www.lteandbeyond.com/2012/03/x2-based-handover-with-sgw-relocation.html > Step 3. The target Serving GW assigns addresses and TEIDs (one per > bearer) for downlink traffic from the PDN GW. The Serving GW allocates > DL TEIDs on S5/S8 even for non-accepted bearers. It sends a Modify > Bearer Request (Serving GW addresses for user plane and TEID(s)) > message per PDN connection to the PDN GW(s). The SGW also includes > User Location Information IE and/or UE Time Zone IE if it is present > in step 2. The PDN GW updates its context field and returns a Modify > Bearer Response (Charging Id, MSISDN, etc.) message to the Serving GW. > The MSISDN is included if the PDN GW has it stored in its UE context. > The PDN GW starts sending downlink packets to the target GW using the > newly received address and TEIDs. These downlink packets will use the > new downlink path via the target Serving GW to the target eNodeB. The > Serving GW shall allocate TEIDs for the failed bearers and inform to > the MME. Section 5.5.1.1.3 of TS 23.401 agrees with the GTPv2C signalling during relocation, AFAICT. > +Therefor the receiving side only identifies tunnels based on TEID's, not > based > +on the source IP. I'm wondering if this really is neccesary. Do you have actual protocol traces, specs or any other literature confirming this? I find it somewhat surprising, given how much this opens the door for arbitrary spoofing from anyone (with access to the respective private network such as GRX). So in which situations specifically will thre be a S-GW side Address change without associated GTP-C signaling informing the P-GW about the new S-GW side Address + TEID? Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
[PATCH net-next] GTP: Add some basic documentation about drivers/net/gtp.c
In order to clarify what the module actually does, and how to use it, let's add some basic documentation to the kernel tree, together with pointers to related specs and projects. Signed-off-by: Harald Welte <lafo...@gnumonks.org> --- Documentation/networking/gtp.txt | 135 +++ 1 file changed, 135 insertions(+) create mode 100644 Documentation/networking/gtp.txt diff --git a/Documentation/networking/gtp.txt b/Documentation/networking/gtp.txt new file mode 100644 index ..93e96750f103 --- /dev/null +++ b/Documentation/networking/gtp.txt @@ -0,0 +1,135 @@ +The Linux kernel GTP tunneling module +== +Documentation by Harald Welte <lafo...@gnumonks.org> + +In 'drivers/net/gtp.c' you are finding a kernel-level implementation +of a GTP tunnel endpoint. + +== What is GTP == + +GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for +tunneling User-IP payload between a mobile station (phone, modem) +and the interconnection between an external packet data network (such +as the internet). + +So when you start a 'data connection' from your mobile phone, the +phone will use the control plane to signal for the establishment of +such a tunnel between that external data network and the phone. The +tunnel endpoints thus reside on the phone and in the gateway. All +intermediate nodes just transport the encapsulated packet. + +The phone itself does not implement GTP but uses some other +technology-dependent protocol stack for transmitting the user IP +payload, such as LLC/SNDCP/RLC/MAC. + +At some network element inside the cellular operator infrastructure +(SGSN in case of GPRS/EGPRS or classic UMTS, hNodeB in case of a 3G +femtocell, eNodeB in case of 4G/LTE), the cellular protocol stacking +is translated into GTP *without breaking the end-to-end tunnel*. So +intermediate nodes just perform some specific relay function. + +At some point the GTP packet ends up on the so-called GGSN (GSM/UMTS) +or P-GW (LTE), which terminates the tunnel, decapsulates the packet +and forwards it onto an external packet data network. This can be +public internet, but can also be any private IP network (or even +theoretically some non-IP network like X.25). + +You can find the protocol specification in 3GPP TS 29.060, available +publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm + +A direct PDF link to v13.6.0 is provided for convenience below: +http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf + +== The Linux GTP tunnelling module == + +The module implements the function of a tunnel endpoint, i.e. it is +able to decapsulate tunneled IP packets in the uplink originated by +the phone, and encapsulate raw IP packets received from the external +packet network in downlink towards the phone. + +It *only* implements the so-called 'user plane', carrying the User-IP +payload, called GTP-U. It does not implement the 'control plane', +which is a signaling protocol used for establishment and teardown of +GTP tunnels (GTP-C). + +So in order to have a working GGSN/P-GW setup, you will need a +userspace program that implements the GTP-C protocol and which then +uses the netlink interface provided by the GTP-U module in the kernel +to configure the kernel module. + +This split architecture follows the tunneling modules of other +protocols, e.g. PPPoE or L2TP, where you also run a userspace daemon +to handle the tunnel establishment, authentication etc. and only the +data plane is accelerated inside the kernel. + +Don't be confused by terminology: The GTP User Plane goes through +kernel accelerated path, while the GTP Control Plane goes to +Userspace :) + +The official homepge of the module is at +https://osmocom.org/projects/linux-kernel-gtp-u/wiki + +== Userspace Programs with Linux Kernel GTP-U support == + +At the time of this writing, there are at least two Free Software +implementations that implement GTP-C and can use the netlink interface +to make use of the Linux kernel GTP-U support: + +* OpenGGSN (classic 2G/3G GGSN in C): + https://osmocom.org/projects/openggsn/wiki/OpenGGSN + +* ergw (GGSN + P-GW in Erlang): + https://github.com/travelping/ergw + +== Userspace Library / Command Line Utilities == + +There is a userspace library called 'libgtpnl' which is based on +libmnl and which implements a C-language API towards the netlink +interface provided by the Kernel GTP module: + +http://git.osmocom.org/libgtpnl/ + +== Protocol Versions == + +There are two different versions of GTP-U: v0 and v1. Both are +implemented in the Kernel GTP module. Version 0 is a legacy version, +and deprecated from recent 3GPP specifications. + +There are three versions of GTP-C: v0, v1, and v2. As the kernel +doesn't implement GTP-C, we don't have to worry about this. It's the +responsibility of the control plane implementation in userspace t
Re: RFC: unit tests for kernel GTP module
Hi Andreas, On Fri, Feb 17, 2017 at 04:02:59PM +0100, Andreas Schultz wrote: > The test suite is proprietary, so we can only share the results but > not the test setup itself. it would be great to have some CI setup where both current stable as well as a development branch of the code is tested, and reports published regularly. Not sure how realistic that is. > We more or less removed static GTP tunnels. The tools in libgtpnl > only work when an application is keeping the GTP socket alive. Well, then those tools need to be adapted accordingly. That's what I also indicated in other mails: Changes in the kernel GTP code should always be followed-up with correspondign changes in libgtpnl and associated tools. I mean, the tool could just open the socket and then continue to run until terminated explicitly... I think there's a lot of value in some very low-level tools for testing and experimentation, without the complexity of configuring + running an Erlang GGSN/P-GW with all its dependencies. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
RFC: unit tests for kernel GTP module
Dear GTP-interested folks, I would love to somehow get towards some degree of unit testing (or even "continuous integration") for teh kernel GTP code. We currently have the original code in the kernel, we had some recent small fixes and now are getting more patches into place. With relatively few active users out there (and probably none of them in production yet), it's particularly easy to introduce regressions while working on the code. Also, having tested new code even against a test set with limited covrage could help to get more confidence in new patches and thus get them merged sooner. Using tools like sgsnemu of OpenGGSN and the command line tools included in libgtpnl, it should be possibel to cook up some scripts for testing. Even the most basic set of tests would be an improvement over what we have now. One could also think about pcap replay to test with hand-crafted or real-world packets from other GTP implementations. As much as I'd like to put something like this into place myself, I don't think I will be able to work much on this in the near future. The GTP module at this point is a pure hobby and contrary to some years ago while I started it, I don't have any contract work in the GTP area at this point, so other projects currently unfortunately get more attention. So in case somebody among the GTP-interested parties (Travelping, OAI, ...) would want to do something in terms of testing, I'd be more than happy if somebody would step ahead. Otherwise it's all just vapourware going to end up on my ever-growing TODO list :/ Also, if netdev folks have some ideas/pointers about possible frameworks/tools for this kind of testing [it must exist for at least some other kernel networking code?]: Please let me know. I'd be interested to have a look if there's something that can be used as a basis (starting network namespaces, sending/transmitting packets, test case startup/teardown, ...) My "old school" approach would have been to start one or multiple user-mode-linux kernels (those that are to be tested), and then have scripts that set up a gtp socket and gtp tunnels via the libgtp command line tools, and throw packets at that. But I'm sure there must be quite powerful frameworks for that kind of testing in the 21st century? How do other tunneling implementations handle this? Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
Hi Andreas, Pablo, Jonas, I think that the SGSN/GGSN role flag (or whatever it may end up being called) logically belongs in the gtp-device at this point, and will in the future belong to the UDP/GTP-socket (with Andreas' proposed changes). Having it per-pdp-context indeed seems odd and just provide a way to create broken configurations (and increase the memory use per pdp context, of which you have many more than netdevs or gtp-sockets). -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
Hi Jonas, On Mon, Feb 06, 2017 at 02:33:07PM +0100, Jonas Bonn wrote: > Fair enough. The use-case I am looking at involves PGW load-testing where > the simulated load is generated locally on the SGSN so it _is_ seeing IP > packets and the SNDCP is left out altogether. Ok, it would have been useful to document that test-only feature in the changelog and/or code. Like "support simulated RAN-side tunnels" or "support SGSN/S-GW simulation". > Perhaps this is too pathological to warrant messing with the upstream > driver... I don't know: the symmetry does not cost much even if it's > of limited use. There are plenty of features in the mainline kernel related to testing, see pktgen for example. So I think if it doesn't impose complexity, performance issues or stretches the existing architecture, I think there's no reason to keep it out. Looking at the code, I think the one conditional on the flags is not going to kill significant performance of the "normal" use case. But that's of course just guessing, without any benchmark to back that up. Semantically, I'm not sure if the FLAGS and the re-use of the SGSN_ADDRESS TLV is the best choice. If suddenly the meaning of the TLV is "Peer GSN Address" then it should be called that way. We could have a #define SGSN_ADDRESS to GSN_PEER_ADDRESS to make old code compile. I'll let Pablo respond to this as he came up with the netlink interface, as far as I can remember :) Also, like with any changes to the kernel and netlink interface code, I think we should always mandate similar changes to be made to libgtpnl so the feature can actually be used/tested with the standard tools/utilities available to anyone. > Couldn't the SNDCP theoretically be a separate node and push IP packets to > the SGSN, thus making this useful? Perhaps it's a stretch... No, because you would introduce an hop (or even two!) at the IP level, breaking * the notion of who the remote IP address is (remote poin-to-point address) of the PDP context * packets get modified (TTL decrement, ...) where they are not supposed to * you suddenly might get TTL exceeded, dest unreachable, ...) out of nowhere into your user IP * you introduce serious security issues by having the kernel IP routing code between the outer IP (the operator RAN/core network) and the inner user IP payload. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v2 3/6] gtp: unify genl_find_pdp and prepare for per socket lookup
On Mon, Jan 30, 2017 at 05:37:10PM +0100, Andreas Schultz wrote: > Signed-off-by: Andreas Schultz <aschu...@tpip.net> Acked-by: Harald Welte <lafo...@gnumonks.org> -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v2 5/6] gtp: add socket to pdp context
Dear All, On Thu, Feb 02, 2017 at 04:07:23PM +0100, Andreas Schultz wrote: > - On Feb 2, 2017, at 3:46 PM, pablo pa...@netfilter.org wrote: > > On Thu, Feb 02, 2017 at 03:38:07PM +0100, Andreas Schultz wrote: > >> - On Feb 2, 2017, at 3:28 PM, pablo pa...@netfilter.org wrote: > >> > I suggest you just call kfree_rcu() from 4/6. > >> > > >> > Regarding holding socket reference, see my comment for patch 1/6. > >> > >> This is going to be a problem at this stage of the changes. > >> > >> The final goal is to have a reference from the socket to the pdp context. > > > > Is this just a cleanup? Or you need this sk caching for some follow up > > work? > > It's not caching, the plan is to completely remove the socket from the > GTP netdevice (as far as that is possible without breaking the existing API). I agree this is the way to go. When I originally thought about the GTP kernel tunneling module early on, I was not aware of the fact that operators actually in practise run multiple "virtual GGSNs" on one IP address/port. From a pure technical point of view you would say "why bother"? They could just use separate IP addresses for each of them. However, the reailty is that each new IP address that an operator uses for a GGSN results in paper forms required to be exchanged between this operator and all his roming partners, followed-up by manual re-configuration of the policies on all of those roaming partners. This is time-consuming and error-prone, but hey, it's how the procedures between GSMA members seem to work ;) > A GGSN or PGW can serve multiple APN's on the same GTP-U socket. Those APN's > can have overlapping IP address ranges. The only sensible way to handle > this, is to have a netdevice per APN. This breaks the current 1:1 relation > between sockets and netdevices. Indeed. So the question is how to do this best and how to keep backwards compatibility of the netlink interface. I don't claim to have answers to that, sorry. -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v2 0/6] gtp: misc improvements
Hi Dave and List, On Tue, Jan 31, 2017 at 01:05:43PM -0500, David Miller wrote: > Please someone with GTP knowledge properly review this series. I am happy to provide review, but I was travelling and my time to work on things outside my dayjob is typically quite limited. So I'd like to ask for a bit more patience for patch review from me. Thanks for your understanding. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v2 1/6] gtp: make GTP sockets in gtp_newlink optional
Hi Andreas, my kernel coding skills are getting a bit rusty (no pun intended), and I'll think others on this list are more capable to do so. But let me at least provide feedback from the "3GPP / GTP side": On Mon, Jan 30, 2017 at 05:37:08PM +0100, Andreas Schultz wrote: > Having both GTPv0-U and GTPv1-U is not always desirable. > Fallback from GTPv1-U to GTPv0-U was depreciated from 3GPP > Rel-8 onwards. Post Rel-8 implementation are discuraged > from listening on the v0 port (see 3GPP TS 29.281, Sect. 1). I confirm this and I think the related change should be applied. > A future change will completely decouple the sockets from the > network device. Till then, at least one of the sockets needs to > be specified (either v0 or v1), the other is optional. Makes sense. -- - Harald Welte <lafo...@netfilter.org> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie
Re: [PATCH net-next v2 2/6] gtp: merge gtp_get_net and gtp_genl_find_dev
Hi Andreas, On Mon, Jan 30, 2017 at 05:37:09PM +0100, Andreas Schultz wrote: > Both function are always used together with the final goal to > get the gtp_dev. This simplifies the code by merging them together. Ok, some code restructuring / unification, seems useful. However: > - netdev_dbg(dev, "GTPv0-U: update tunnel id = %llx (pdp > %p)\n", > -pctx->u.v0.tid, pctx); > + pr_debug("GTPv0-U: update tunnel id = %llx (pdp %p)\n", > + pctx->u.v0.tid, pctx); (and other related changes) appear to be purely cosmetic and should thus be unrelated to the function merging described in the change log message. -- - Harald Welte <lafo...@netfilter.org> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie
Re: [PATCH 1/1] gtp: support SGSN-side tunnels
Dear Jonas, On Fri, Feb 03, 2017 at 10:12:31AM +0100, Jonas Bonn wrote: > The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP > contexts based on the incoming packets _destination_ address. If we > want to write an SGSN, then we want to be idenityfing PDP contexts > based on _source_ address. A SGSN should never see the "native" User-IP payload. It either has a Gb interface towards the BSS which has a BSSGP/NS/UDP/IP protocol stacking (for GERAN) or an IuUP or GTP stacking (for UTRAN). The user-ip tunnel (PDP context) exists between the mobile station and the GGSN. Any intermediate nodes (BTS, BSC, NodeB, RNC, SGSN, ...) do not appear as intermediate IP Layer nodes in that User IP tunnel, but only serve to transparently pass the user-ip inside that tunnel between the two tunnel end-points. As such, I am very surprised by your patch. Exposing the User IP to the Linux network stack in the SGSN seems to be a severe layering violation and contradict everything I know about 3GPP network architecture. But maybe I'm missing something here? Please explain. The only SGSN-level user plane acceleration that I can think of is quite different: For an UMTS SGSN that only supports IuPS, and only supports IuPS over IP (which is a sub-class of a sub-class of all use cases), what would make sense is some Kernel-level support to map from one GTP socket/tunnel to another GTP socket/tunnel based on the TEIDs. So basically you have a GTP tunnel on the RAN side and another GTP tunnel on the CN side, without any decapsultaion. The TEIDs on both sides *could* be identical, or *cold* be separate, as a matter of implementation policy. The IP addresses /port numbers on both sides will in almost all non-laboratory use cases be separate, as an operator typically doesn't want a transparent/routed IP network between the RAN and Core Network (CN). So the GTP tunnels between RNC/hNodeB/heNodeB on the RAN side get mapped 1:1 to GTP tunnels between SGSN and GGSN on the CN side. However, as no encapsulation/decapsulation is performed, this is outside of the scope of the current kernel GTP tunneling module. Rather, it's more something similar to static NAT between two pairs of addresses. Regards, Harald -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH net-next v2 4/6] gtp: consolidate pdp context destruction into helper
Hi Andreas, On Mon, Jan 30, 2017 at 05:37:11PM +0100, Andreas Schultz wrote: > Consolidate duplicate code into helper. Makes complete sense. However: > +static void pdp_context_delete(struct pdp_ctx *pctx); > + why introduce the forward-declaration and then define the function further down in the file? I think in general, it is preferred to put helper functions on top, before their first use, so forward declarations can be avoided? Particularly if the helper function doesn't call any other functions that are not yet declared at this point. Please note the above might just be my personal taste, not sure if that's a general habit in the kernel networking code these days. So with or without the re-ordering: Acked-by: Harald Welte <lafo...@gnumonks.org> -- - Harald Welte <lafo...@gnumonks.org> http://laforge.gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)
Re: [PATCH 0/5] simple gtp improvements
Hi Andreas, I agree with your changes (particularly those related to 3GPP specs) like 2/5 and 5/5. Also, 1/5 is of course obvious. For kernel topics like 3/5 and 4/5 I trust Pablo and the general netdev crew to have better judgement than me. -- - Harald Welte <lafo...@netfilter.org> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie
Re: [PATCH v2 00/18] gtp: fixes and support multiple VRF's per GTP socket
Hi Andreas, On Tue, Jan 24, 2017 at 04:28:30PM +0100, Andreas Schultz wrote: > During that work some smaller problems where found and fixes for them are > included. As this is a rather large patch-set, and I think it might be better to split it in clear bugfixes of the existing code (or no-brainers like the module alias) and in introducing new featur regarding socket/netdev. However, that's just my comment "from the side", I'm not the one who needs to be convinced in terms of merging them :) I would just assume that the clear fixes + no-brainers could go in right away without any further review... -- - Harald Welte <lafo...@netfilter.org> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie
[PATCH] fix br_fdb_fini() section mismatch
When compiling a kernel (current linus git or 2.6.24-rc4) with built-in CONFIG_BRIDGE, I get the following error: LD .tmp_vmlinux1 `br_fdb_fini' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o make: *** [.tmp_vmlinux1] Error 1 This patch fixes it. Signed-off-by: Harald Welte [EMAIL PROTECTED] diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index eb57502..bc40377 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -44,7 +44,7 @@ int __init br_fdb_init(void) return 0; } -void __exit br_fdb_fini(void) +void br_fdb_fini(void) { kmem_cache_destroy(br_fdb_cache); } -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
On Sun, Jan 07, 2007 at 05:11:06PM +0100, Lennert Buytenhek wrote: On Sun, Jan 07, 2007 at 03:11:34PM +0100, Harald Welte wrote: So instead of using NAT to dynamically redirect traffic to local addresses, we now rely on native non-locally-bound sockets and do early socket lookups for inbound IPv4 packets. It's good to see a solid implementation of this 'old idea'. Just as a quick historical note to netdev: This is the way how the netfilter project advised the balabit guys to implement fully transparent proxy support, after having seen the complexity of the old nat-based TPROXY patches. Didn't rusty tell the balabit guys to use the NAT approach? that was originally, way back. It turned out to be a bad idea, after all... way too complex. At least that's how I look at it. Too sad :( Rusty and me then had the idea about the routing based approach at some point, if I remember correctly. We talked about it with Krisztian and Balazs at least on one occasion. All that isn't really important. All I wanted to say was: I (and AFAIR the netfilter core team) believe this is the way to implement good support for transparent proxying. It's already the second completely independent implementation, let's merge it after all. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie signature.asc Description: Digital signature
Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
Hi Krisztian! On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian wrote: So instead of using NAT to dynamically redirect traffic to local addresses, we now rely on native non-locally-bound sockets and do early socket lookups for inbound IPv4 packets. It's good to see a solid implementation of this 'old idea'. Just as a quick historical note to netdev: This is the way how the netfilter project advised the balabit guys to implement fully transparent proxy support, after having seen the complexity of the old nat-based TPROXY patches. So I personally support this patchset and vote for it to be included (with whatever modifications netdev deems apropriate) It might be that there now is the experimental netchannels system which might provide an even better way for transparent proxy support. However, ever since ip_tables was merged in the 2.3.x days, we have lacked good support for transparent proxies. Now that the first incarnation of the NAT based TPROXY patch for 2.4.x had to be developed and maintained out-of-tree for many years, I definitely think it's better to merge the new, way less intrusive, patchset. Some interested party can work on a netchannels implementation later on, but that's the next generation... Cheers, -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie signature.asc Description: Digital signature
Re: [IPROUTE] manpage for lnstat
On Thu, Oct 26, 2006 at 10:41:26PM +0200, Michael Prokop wrote: Hello, I wrote a manpage for lnstat, would be great if it could be applied to the next release. Stephen: Please include it into your next release, looks fine to me. @Harald, I'm following your 'If somebody wants to do a manpage, feel free to send me a patch :)' of lnstat's README. :) thanks a lot! -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpudwX1x8eA0.pgp Description: PGP signature
[BUG 2.6.18] unaligned access in ipvv6_rcv, nf_ip6_checksum, tcp_error, __ipv6_addr_type, fib6_lookup_1
Hi! I've just built 2.6.18 on a sparc64 box (Ultra 5) using gcc-3.3.5 (debian woody). After booting the kernel, I get tons of unaligned access messages related to various bits of the IPv6 code: Kernel unaligned access at TPC[10022cf0] ipv6_rcv+0xb8/0x320 [ipv6] Kernel unaligned access at TPC[10023800] __ipv6_addr_type+0x8/0x140 [ipv6] Kernel unaligned access at TPC[1002fd64] fib6_lookup_1+0x2c/0x120 [ipv6] Kernel unaligned access at TPC[10093878] tcp_error+0x40/0x2c0 [nf_conntrack] Kernel unaligned access at TPC[1004ce54] nf_ip6_checksum+0x13c/0x1c0 [ipv6] Kernel unaligned access at TPC[1004ce58] nf_ip6_checksum+0x140/0x1c0 [ipv6] Kernel unaligned access at TPC[1004ce60] nf_ip6_checksum+0x148/0x1c0 [ipv6] As I'm not really familiar at all with sparc assembly, the usual look at objectdump, try to see what it does, look at corresponding source code strategy doesn't work for me in this case. There could be 7 different bugs, but I think it's more likely that some common data structure is misaligned and thus causes unaligned accesses all over the place. I've put the ipv6.ko and nf_conntrack.ko modules online at http://people.netfilter.org/laforge/tmp/2618_bug/ This could be a known issue, but I couldn't find any reference to it on netdev, sparclinux or via google. Any assistance is appreciated, thanks! -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpmlwbGFRFtJ.pgp Description: PGP signature
Re: driver for pptp
On Sat, Jun 03, 2006 at 11:06:19AM +0400, [EMAIL PROTECTED] wrote: I have developed the driver for Point-to-Point Tunneling Protocol (PPTP). great news. something that I always thought of a nice-to-have. I have published the project on http://accel-pptp.sourceforge.net/ Please don't expect Linux Kernel networking developers to actually go to sourceforge download and extract code that you want to have reviewed/submitted. Please read Documentation/SubmittingPatches (and CodingStyle) and submit your kernel patch to netdev. Hope this driver will go to a kernel tree and will make linux more productive. not without you pushing it actively and getting through review cycles (which I hope you will!). Some initial comments: 1) why wasn't it possible to use the PPPoX infrastructure of the kernel which is already being used by PPPoE ? Or at least model it somehow similar to the existing PPPoE/PPPoX infrastructure? 2) why are you using a timer for asynchronous processing of GRE frames? First of all, why does it have to happen asynchronously at all? Secondly, why using a timer when there's nothing time related (or do I miss something)? If deferred, out-of-context execution is required, there are other primitives such as tasklets. 3) you conflict with the ip_gre.c genric GRE encapsulation driver. this is because both want to reigster a proto handler for GRE. Ideally, there needs to be another demultiplex between the GRE protocl and its users. The code registered for GRE would look at the packet and determine whether e.g. it is a PPTP GRE packet and then pass it on to the pptp module. 4) your code doesn't look nonlinear skb clean 5) why did you chose to implement /dev/pptp rather than a socket family like the existing pppox/pppoe code? 6) lots of codingstyle issues -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpuEiqiED7WD.pgp Description: PGP signature
Re: [RFC/PATCH 1/2] in-kernel sockets API
On Tue, Jun 13, 2006 at 02:12:41PM -0700, Daniel Phillips wrote: This has the makings of a nice stable internal kernel api. Why do we want to provide this nice stable internal api to proprietary modules? because there is IMHO legally nothing we can do about it anyway. Use of an industry-standard API that is provided in multiple operating system is one of the clearest idnication of some program _not_ being a derivative work. Whether we like it or not, it doesn't really matter if we export them GPL-only or not. Anybody using those scoket API calls will be having an easy time arguing in favor of non-derivative work. The GPL doesn't extend beyon what copyright law allows you to do... -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpFWlVVgch8f.pgp Description: PGP signature
Re: [RFC/PATCH 1/2] in-kernel sockets API
On Wed, Jun 14, 2006 at 04:29:04PM +0200, Erik Mouw wrote: On Wed, Jun 14, 2006 at 03:30:22PM +0200, Harald Welte wrote: On Tue, Jun 13, 2006 at 02:12:41PM -0700, Daniel Phillips wrote: This has the makings of a nice stable internal kernel api. Why do we want to provide this nice stable internal api to proprietary modules? because there is IMHO legally nothing we can do about it anyway. Use of an industry-standard API that is provided in multiple operating system is one of the clearest idnication of some program _not_ being a derivative work. IMHO there is no industry-standard API for in-kernel use of sockets. There is however one for user space. it doesn't matter in what space you are. If the API really is similar enough, then any piece of code (no matter where it was originally intended to run) will be able to work with any such socket API. The whole point of this is: Where is the derivation of an existing work? I can write a program against some BSD socket api somewhere, and I can easily make it use the proposed in-kernel sockets API. No derivation of anything that is inside the kernel and GPL licensed. (IANAL, etc) Neither am I, but I'm constantly dealing with legal questions related to the GPL while running gpl-violations.org. -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpIzJfnLL2Vq.pgp Description: PGP signature
Re: [RFC] Geographical/regulatory information for ieee80211
On Wed, Apr 26, 2006 at 07:54:31PM -0500, Larry Finger wrote: I am leaning toward putting the geographical information into a userland daemon. I like that idea very much. This is all control metadate that doesn't really need to be in the kernel. That way we won't have to patch the kernel every time a country modifies its regulations. that's another advantage. In addition, the kernel will be smaller. The downside is that the daemon will have to be updated and supplied in some convenient form, perhaps as part of a wireless tools package. Ideally the daemon would get the table of country restrictions from a policy file (some human-readable ascii?). That file can then be downloaded by a cronjob to keep it updated, if desired. A bit like the PCI/USB device id databases... -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpYfGS2xTcML.pgp Description: PGP signature
Re: [RFC: 2.6 patch] the overdue removal of ip{,6}_queue
On Sun, Apr 09, 2006 at 08:40:05PM +0200, Patrick McHardy wrote: Adrian Bunk wrote: This patch contains the overdue removal of ip{,6}_queue. I think this is too early, none of the distributions seem to have picked up the compat library already. I believe (not sure, Harald?) that it requires at least recompilation of programs using libipq, so the 6 month period seems too short. Yes, apparently the distro's don't seem to be picking up the new userspace libraries. The question is, however, whether any further delay really is the right motivation for them to pick them up soon ;) Maybe we should add a printk ('app foo is using obsolete ip_queue system'). It also appears the compat library won't work on older system, could you elaborate on that? so I have to admit I'm not sure about the removal at all. I'm still for the removal, but probably giving it some more time. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpjVgSowAMmi.pgp Description: PGP signature
Re: [RFC: 2.6 patch] the overdue removal of ip{,6}_queue
On Tue, Apr 11, 2006 at 04:42:50PM +0200, Patrick McHardy wrote: Maybe we should add a printk ('app foo is using obsolete ip_queue system'). Good idea, that will probably help speed it up. But I still think we need to give them at least another six month. ok. I'll prepare a patch for both the printk and the update of feature-removal-schedule. I think we need to do two things: - make libipq_compat a drop-in replacement that doesn't require recompilation libipq is not distributed as a shared library (at least not by us), so I don't see any purpose for doing so. Do you think anyone is going to re-link statically linked code against the new lib - make it work with both ipq and nfnetlink_queue that should be possible, though. I still don't really think that it is worth all the effort, especially since there is only one hand full of applications using libipq. The backwards compatibility library is expected to perform a lot worse thna both the old libipq as well as the new native libnetfilter_queue due to additional data copies, etc. When I wrote it, it was more intended as some intermediate aid, something that helps while people migrate. We shouldn't make it too perfect, otherwise they won't migrate at all. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpUyDjQahTpM.pgp Description: PGP signature
Re: [PATCH] [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'
On Mon, Feb 13, 2006 at 10:32:49AM +0900, Yasuyuki KOZAKAI wrote: Hi, Harald, From: Harald Welte [EMAIL PROTECTED] Date: Sun, 12 Feb 2006 18:56:22 +0100 [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte [EMAIL PROTECTED] This patch seems not to include fixes I pointed out and you said fixed. Please confirm if you really sent the intended one. oops, thanks for checking. Dave: Please hold, I sent an old version. You'll get a new one ASAP. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpRGzy12Mxgh.pgp Description: PGP signature
Re: [PATCH] [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'
Hi Dave! This is the correct (latest) version of this patch. Sorry for the confusion. Please apply to net-2.6.17, thanks! [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit aba8cf3ac5b60e10eb1ce9f30b6bb4c007ab9868 tree 758aedbe80b8306c1991a05c47eab8905d33a13b parent 94d3d40c84672b74e59ea5252f61602610e1513e author Harald Welte [EMAIL PROTECTED] Sun, 29 Jan 2006 19:19:06 +0100 committer Harald Welte [EMAIL PROTECTED] Sun, 29 Jan 2006 19:19:06 +0100 include/net/netfilter/nf_conntrack.h | 56 +++ net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 22 ++--- net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 39 +--- net/netfilter/nf_conntrack_core.c | 117 ++-- net/netfilter/nf_conntrack_ftp.c |2 net/netfilter/nf_conntrack_netlink.c | 39 +--- net/netfilter/nf_conntrack_standalone.c|1 net/netfilter/xt_helper.c |8 +- 8 files changed, 147 insertions(+), 137 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6d075ca..9d1f0e6 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -67,6 +67,18 @@ do { \ struct nf_conntrack_helper; +/* nf_conn feature for connections that have a helper */ +struct nf_conn_help { + /* Helper. if any */ + struct nf_conntrack_helper *helper; + + union nf_conntrack_help help; + + /* Current number of expected connections */ + unsigned int expecting; +}; + + #include net/netfilter/ipv4/nf_conntrack_ipv4.h struct nf_conn { @@ -81,6 +93,9 @@ struct nf_conn /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; + /* If we were expected by an expectation, this will be it */ + struct nf_conn *master; + /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; @@ -88,38 +103,22 @@ struct nf_conn /* Accounting Information (same cache line as other written members) */ struct ip_conntrack_counter counters[IP_CT_DIR_MAX]; #endif - /* If we were expected by an expectation, this will be it */ - struct nf_conn *master; - - /* Current number of expected connections */ - unsigned int expecting; /* Unique ID that identifies this conntrack*/ unsigned int id; - /* Helper. if any */ - struct nf_conntrack_helper *helper; - /* features - nat, helper, ... used by allocating system */ u_int32_t features; - /* Storage reserved for other modules: */ - - union nf_conntrack_proto proto; - #if defined(CONFIG_NF_CONNTRACK_MARK) u_int32_t mark; #endif - /* These members are dynamically allocated. */ - - union nf_conntrack_help *help; + /* Storage reserved for other modules: */ + union nf_conntrack_proto proto; - /* Layer 3 dependent members. (ex: NAT) */ - union { - struct nf_conntrack_ipv4 *ipv4; - } l3proto; - void *data[0]; + /* features dynamically at the end: helper, nat (both optional) */ + char data[0]; }; struct nf_conntrack_expect @@ -373,10 +372,23 @@ nf_conntrack_expect_event(enum ip_conntr #define NF_CT_F_NUM4 extern int -nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size, - int (*init_conntrack)(struct nf_conn *, u_int32_t)); +nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size); extern void nf_conntrack_unregister_cache(u_int32_t features); +/* valid combinations: + * basic: nf_conn, nf_conn .. nf_conn_help + * nat: nf_conn .. nf_conn_nat, nf_conn .. nf_conn_nat, nf_conn help + */ +static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) +{ + unsigned int offset = sizeof(struct nf_conn); + + if (!(ct-features NF_CT_F_HELP)) + return NULL; + + return (struct nf_conn_help *) ((void *)ct + offset); +} + #endif /* __KERNEL__ */ #endif /* _NF_CONNTRACK_H */ diff --git a/net/ipv4/netfilter
[PATCH] 2.6.16: [NETFILTER] Fix Kconfig menu level for x_tables
Hi Dave, please apply this to the 2.6.16 tree and push it to Linus. Thanks! [NETFILTER] Fix Kconfig menu level for x_tables The new x_tables related Kconfig options appear at the wrong menu level without this patch. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit 2d3e1788278398e3d9f33409066f3232730336a6 tree d23977bfeafb91fdbc5e5400d516ef623cde5a1d parent 0b2d6ca5297c6343d9ae4c2259f85046e487ed01 author Harald Welte [EMAIL PROTECTED] Mon, 13 Feb 2006 12:04:25 +0100 committer Harald Welte [EMAIL PROTECTED] Mon, 13 Feb 2006 12:04:25 +0100 net/netfilter/Kconfig |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 99c0a0f..0e55012 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -102,8 +102,6 @@ config NF_CT_NETLINK help This option enables support for a netlink-based userspace interface -endmenu - config NETFILTER_XTABLES tristate Netfilter Xtables support (required for ip_tables) help @@ -361,3 +359,5 @@ config NETFILTER_XT_MATCH_TCPMSS To compile it as a module, choose M here. If unsure, say N. +endmenu + -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpHFnw0n6kSi.pgp Description: PGP signature
[PATCH 1/1] [NETFILTER] bridge: fix link error with recent netfilter-bridge change
[NETFILTER] bridge: fix link error with recent netfilter-bridge change commit 5dce971acf2ae20c80d5e9d1f6bbf17376870911 removed the has_bridge_parent macro from net/bridge/br_netfilter.c, resulting in the following warning at compile time: net/bridge/br_netfilter.c: In function `br_nf_post_routing': net/bridge/br_netfilter.c:808: warning: implicit declaration of function `has_bridge_parent' and this error at link time: net/built-in.o(.text+0xeae28): In function `br_nf_post_routing': net/bridge/br_netfilter.c:808: undefined reference to `has_bridge_parent' make: *** [.tmp_vmlinux1] Error 1 The patch below fixes the problem. Signed-off-by: Jesper Juhl [EMAIL PROTECTED] Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit d63ec539a20c35537c930ebadeac0f58bccee86f tree 65a56c18c840f137bf1c2f778af6afba215d143e parent de7cdf0307f39c1399922431144d2dc8f327ee42 author Jesper Juhl [EMAIL PROTECTED] Sun, 12 Feb 2006 12:35:40 +0100 committer Harald Welte [EMAIL PROTECTED] Sun, 12 Feb 2006 12:35:40 +0100 net/bridge/br_netfilter.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index b501816..c06cb09 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -805,7 +805,7 @@ static unsigned int br_nf_post_routing(u print_error: if (skb-dev != NULL) { printk([%s], skb-dev-name); - if (has_bridge_parent(skb-dev)) + if (bridge_parent(skb-dev)) printk([%s], bridge_parent(skb-dev)-name); } printk( head:%p, raw:%p, data:%p\n, skb-head, skb-mac.raw, -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'
Hi Dave! This is net-2.6.17 stuff, please apply. Thanks! [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn' This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit aba8cf3ac5b60e10eb1ce9f30b6bb4c007ab9868 tree 758aedbe80b8306c1991a05c47eab8905d33a13b parent 94d3d40c84672b74e59ea5252f61602610e1513e author Harald Welte [EMAIL PROTECTED] Sun, 29 Jan 2006 19:19:06 +0100 committer Harald Welte [EMAIL PROTECTED] Sun, 29 Jan 2006 19:19:06 +0100 include/net/netfilter/nf_conntrack.h | 56 +++ net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 22 ++--- net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 39 +--- net/netfilter/nf_conntrack_core.c | 117 ++-- net/netfilter/nf_conntrack_ftp.c |2 net/netfilter/nf_conntrack_netlink.c | 39 +--- net/netfilter/nf_conntrack_standalone.c|1 net/netfilter/xt_helper.c |8 +- 8 files changed, 147 insertions(+), 137 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 6d075ca..9d1f0e6 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -67,6 +67,18 @@ do { \ struct nf_conntrack_helper; +/* nf_conn feature for connections that have a helper */ +struct nf_conn_help { + /* Helper. if any */ + struct nf_conntrack_helper *helper; + + union nf_conntrack_help help; + + /* Current number of expected connections */ + unsigned int expecting; +}; + + #include net/netfilter/ipv4/nf_conntrack_ipv4.h struct nf_conn { @@ -81,6 +93,9 @@ struct nf_conn /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; + /* If we were expected by an expectation, this will be it */ + struct nf_conn *master; + /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; @@ -88,38 +103,22 @@ struct nf_conn /* Accounting Information (same cache line as other written members) */ struct ip_conntrack_counter counters[IP_CT_DIR_MAX]; #endif - /* If we were expected by an expectation, this will be it */ - struct nf_conn *master; - - /* Current number of expected connections */ - unsigned int expecting; /* Unique ID that identifies this conntrack*/ unsigned int id; - /* Helper. if any */ - struct nf_conntrack_helper *helper; - /* features - nat, helper, ... used by allocating system */ u_int32_t features; - /* Storage reserved for other modules: */ - - union nf_conntrack_proto proto; - #if defined(CONFIG_NF_CONNTRACK_MARK) u_int32_t mark; #endif - /* These members are dynamically allocated. */ - - union nf_conntrack_help *help; + /* Storage reserved for other modules: */ + union nf_conntrack_proto proto; - /* Layer 3 dependent members. (ex: NAT) */ - union { - struct nf_conntrack_ipv4 *ipv4; - } l3proto; - void *data[0]; + /* features dynamically at the end: helper, nat (both optional) */ + char data[0]; }; struct nf_conntrack_expect @@ -373,10 +372,23 @@ nf_conntrack_expect_event(enum ip_conntr #define NF_CT_F_NUM4 extern int -nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size, - int (*init_conntrack)(struct nf_conn *, u_int32_t)); +nf_conntrack_register_cache(u_int32_t features, const char *name, size_t size); extern void nf_conntrack_unregister_cache(u_int32_t features); +/* valid combinations: + * basic: nf_conn, nf_conn .. nf_conn_help + * nat: nf_conn .. nf_conn_nat, nf_conn .. nf_conn_nat, nf_conn help + */ +static inline struct nf_conn_help *nfct_help(const struct nf_conn *ct) +{ + unsigned int offset = sizeof(struct nf_conn); + + if (!(ct-features NF_CT_F_HELP)) + return NULL; + + return (struct nf_conn_help *) ((void *)ct + offset); +} + #endif /* __KERNEL__ */ #endif /* _NF_CONNTRACK_H */ diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
Re: [PATCH] [NETFILTER] nfnetlink_log: add sequence numbers for log events
On Tue, Jan 31, 2006 at 12:32:21AM +0100, Patrick McHardy wrote: Harald Welte wrote: Hi Dave, please apply, thanks! [NETFILTER] nfnetlink_log: add sequence numbers for log events By using a sequence number for every logged netfilter event, we can determine from userspace whether logging information was lots somewhere downstream. BTW, I have a patch I wanted to submit on top of this, which changes the *LOG targets to do reliable logging, which means if we encounter any errors during logging (for example from netlink), the packet will be dropped. This makes as sure as possible that no connections will be silently accepted. Its a slight change of user-visible behaviour, but since it only affects corner-cases I think it should be OK. I could add some flags to retain the current behaviour, but I think its not worth it. I think it is very much required to have such a flag. (we can actually add it to the nfnetlink_log flags). It really depends on your setup. Some people really really want to have logging reliable, others fear that they might easily be DoS'ed, if logging has higher priority than packet forwarding. Any objections? If it's optional (and the default is unreliable), then I think it's a great idea. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpnfOO9W6O2I.pgp Description: PGP signature
ip_conntrack related slab error (Re: Fw: Re: 2.6.16-rc1-mm3)
Begin forwarded message: Date: Sat, 28 Jan 2006 00:47:06 +1300 From: Reuben Farrelly [EMAIL PROTECTED] To: Andrew Morton [EMAIL PROTECTED] Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: 2.6.16-rc1-mm3 Just triggered this one, which had a fairly bad effect on connectivity to the box: i2c /dev entries driver slab error in kmem_cache_destroy(): cache `ip_conntrack': Can't free all objects [b010412b] show_trace+0xd/0xf [b01041cc] dump_stack+0x17/0x19 [b0155d04] kmem_cache_destroy+0x9b/0x1a9 [f0ebf701] ip_conntrack_cleanup+0x5d/0x10e [ip_conntrack] [f0ebe31e] init_or_cleanup+0x1f8/0x283 [ip_conntrack] [f0ec2c4e] fini+0xa/0x66 [ip_conntrack] [b0136d06] sys_delete_module+0x161/0x1fb [b0102b3f] sysenter_past_esp+0x54/0x75 Removing netfilter NETLINK layer. [EMAIL PROTECTED] log]# I was just reading IMAP mail at the time, ie same as I'd been doing for an hour or two beforehand and not altering config of the box in any way. I was able to log on via console but lost all network connectivity and had to reboot :( The codepath you see in that backtrace is only hit during load or removal of the 'ip_conntrack' module. While this certainly still should not oops, your description of 'not doing anything but IMAP reading' is certainly not true. Could you please describe what actually happened when that bug happened? It looks to me that you were unloading ip_conntrack_netlink.ko followed by ip_conntrack.ko. Generic details such as .config is at http://www.reub.net/files/kernel/ You don't have permission to access /files/kernel/ on this server. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgp97NXr97jrX.pgp Description: PGP signature
Re: [Netlink] virtual interface
On Tue, Jan 24, 2006 at 09:19:02PM +0100, Mathieu Geli wrote: Here is a patch that add a netlink virtual interface. I really like the possibility that it offers. In fact, I always wanted something like a 'promiscuous netlink socket'. Whether it is the right thing to add a full-blown net_device for it, I don't know. This really only is a kludge for scapy/ethereal/whatever, since they just expect tu use network interfaces and AF_PACKET sockets. The question is whether we should introduce a net_device to work around scapy/ethereal limitations - or whether we'd rather have some different interface. I would like to hear comments from Dave, Jamal, Thomas Graf, Patrick McHardy on this... btw: There seem to be a number of coding style and whitespace issue in your patch. Please make sure you follow Documentation/CodingStyle as closely as possible. Dissectors are being developed for scapy [1] at the moment. please note that a number of netlink related data structures are host endian, and others are network endian, where again others are mixed host/network endian. This can create soem headache while writing dissectors ;) -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpzqQ2CKzt4F.pgp Description: PGP signature
Re: wireless-2.6 status (25 January 2006)
On Fri, Jan 27, 2006 at 07:33:05AM -0500, John W. Linville wrote: On Fri, Jan 27, 2006 at 09:17:50AM +, Christoph Hellwig wrote: Folks, please stop these stupid ideas. There's a free driver, let's improve and merge it. That's a thousand times better than any half-free driver with buggy binary blobs. I presume you mean the ath-driver.org stuff? I'm perfectly open to using that code, but some had raised issues about the reverse-engineering process used to create it. Has that been resolved? It is unclear. I'm in contact with lawyers from the institute for low and open source software here in Germany, who conducted an analysis on reverse engineering. The result of their analysis leaves more open questions than it answers. The first problem is that there appears to be no precedent whatsoever on the reverse engineering interoperability exceptions. Also, the wording of those exceptions seem to assume software/software interoperability, not software/hardware. Furthermore, the .de implementation on the EU directive on 'reverse engineering' explicitly states 'decompilation' is forbidden. Technically speaking, that's different from disassembly. But what would a court rule? Also, the interoperability exceptions seem to be written with closed-source software in mind, since the result of such conditionally allowed re-engineering is not allowed to be published. So for now, I personally would ignore the concerns. Please tell me how many Linux device drivers we would have, if nobody had ever looked into some disassembled proprietary driver? Don't get me wrong, I'm very much against any kind of 'carbon copying' code from a proprietary driver into a free one. However, merely looking at existing proprietary code in order to figure out which bit of a register resembles what function, and then implementing a driver from scratch using that information seems pretty safe to me. Actually, running the proprietary code in a simulator and just looking at the inputs/outputs of that simulator seems to be the safest way. Also, who will be able to prove that you actually looked at some assembly instructions rather than observing the proprietary program running in a simulator? -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpY8sDn6yEHs.pgp Description: PGP signature
[PATCH] [NETFILTER] nfnetlink_log: add sequence numbers for log events
Hi Dave, please apply, thanks! [NETFILTER] nfnetlink_log: add sequence numbers for log events By using a sequence number for every logged netfilter event, we can determine from userspace whether logging information was lots somewhere downstream. The user has a choice of either having per-instance local sequence counters, or using a global sequence counter, or both. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit e3c7a1f99300fbd6de35a40fcd9c4dc1b0fbfee2 tree fa0df65edd0ec729af8033c0605b19dc49c1a688 parent 1259fffe55307fa41e932a594442aa78e1e37cfd author Harald Welte [EMAIL PROTECTED] Thu, 26 Jan 2006 15:05:29 +0100 committer Harald Welte [EMAIL PROTECTED] Thu, 26 Jan 2006 15:05:29 +0100 include/linux/netfilter/nfnetlink_log.h |6 net/netfilter/nfnetlink_log.c | 46 +++ 2 files changed, 52 insertions(+), 0 deletions(-) diff --git a/include/linux/netfilter/nfnetlink_log.h b/include/linux/netfilter/nfnetlink_log.h index b04b038..a7497c7 100644 --- a/include/linux/netfilter/nfnetlink_log.h +++ b/include/linux/netfilter/nfnetlink_log.h @@ -47,6 +47,8 @@ enum nfulnl_attr_type { NFULA_PAYLOAD, /* opaque data payload */ NFULA_PREFIX, /* string prefix */ NFULA_UID, /* user id of socket */ + NFULA_SEQ, /* instance-local sequence number */ + NFULA_SEQ_GLOBAL, /* global sequence number */ __NFULA_MAX }; @@ -77,6 +79,7 @@ enum nfulnl_attr_config { NFULA_CFG_NLBUFSIZ, /* u_int32_t buffer size */ NFULA_CFG_TIMEOUT, /* u_int32_t in 1/100 s */ NFULA_CFG_QTHRESH, /* u_int32_t */ + NFULA_CFG_FLAGS,/* u_int16_t */ __NFULA_CFG_MAX }; #define NFULA_CFG_MAX (__NFULA_CFG_MAX -1) @@ -85,4 +88,7 @@ enum nfulnl_attr_config { #define NFULNL_COPY_META 0x01 #define NFULNL_COPY_PACKET 0x02 +#define NFULNL_CFG_F_SEQ 0x0001 +#define NFULNL_CFG_F_SEQ_GLOBAL0x0002 + #endif /* _NFNETLINK_LOG_H */ diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index e10512e..a98d0b7 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -11,6 +11,10 @@ * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. * + * 2006-01-26 Harald Welte [EMAIL PROTECTED] + * - Add optional local and global sequence number to detect lost + * events from userspace + * */ #include linux/module.h #include linux/skbuff.h @@ -68,11 +72,14 @@ struct nfulnl_instance { unsigned int nlbufsiz; /* netlink buffer allocation size */ unsigned int qthreshold;/* threshold of the queue */ u_int32_t copy_range; + u_int32_t seq; /* instance-local sequential counter */ u_int16_t group_num;/* number of this queue */ + u_int16_t flags; u_int8_t copy_mode; }; static DEFINE_RWLOCK(instances_lock); +static atomic_t global_seq; #define INSTANCE_BUCKETS 16 static struct hlist_head instance_table[INSTANCE_BUCKETS]; @@ -310,6 +317,16 @@ nfulnl_set_qthresh(struct nfulnl_instanc return 0; } +static int +nfulnl_set_flags(struct nfulnl_instance *inst, u_int16_t flags) +{ + spin_lock_bh(inst-lock); + inst-flags = ntohs(flags); + spin_unlock_bh(inst-lock); + + return 0; +} + static struct sk_buff *nfulnl_alloc_skb(unsigned int inst_size, unsigned int pkt_size) { @@ -373,6 +390,8 @@ static void nfulnl_timer(unsigned long d spin_unlock_bh(inst-lock); } +/* This is an inline function, we don't really care about a long + * list of arguments */ static inline int __build_packet_message(struct nfulnl_instance *inst, const struct sk_buff *skb, @@ -511,6 +530,17 @@ __build_packet_message(struct nfulnl_ins read_unlock_bh(skb-sk-sk_callback_lock); } + /* local sequence number */ + if (inst-flags NFULNL_CFG_F_SEQ) { + tmp_uint = htonl(inst-seq++); + NFA_PUT(inst-skb, NFULA_SEQ, sizeof(tmp_uint), tmp_uint); + } + /* global sequence number */ + if (inst-flags NFULNL_CFG_F_SEQ_GLOBAL) { + tmp_uint = atomic_inc_return(global_seq); + NFA_PUT(inst-skb, NFULA_SEQ_GLOBAL, sizeof(tmp_uint), tmp_uint); + } + if (data_len) { struct nfattr *nfa; int size = NFA_LENGTH(data_len); @@ -603,6 +633,11 @@ nfulnl_log_packet(unsigned int pf, spin_lock_bh(inst-lock); + if (inst-flags NFULNL_CFG_F_SEQ) + size += NFA_SPACE(sizeof(u_int32_t)); + if (inst-flags NFULNL_CFG_F_SEQ_GLOBAL) + size += NFA_SPACE(sizeof(u_int32_t)); + qthreshold
Re: WCONF, netlink based WE replacement.
On Fri, Jan 13, 2006 at 02:15:03PM +0100, Thomas Graf wrote: * Michael Buesch [EMAIL PROTECTED] 2006-01-12 18:24 This is an attempt to rewrite the Wireless Extensions userspace API, using netlink sockets. There should also be a notification API, to inform userspace for changes (config changes, state changes, etc). It is not implemented, yet. I'll only comment on the netlink bits and leave the rest to others. I'd highly recommend the use of attributes instead of fixed message structures to allow the interface to be flexible to extensions while staying binary compatible. I can only second Thomas' suggestion. The power and flexibility of the use of netlink is only really unveiled if you use attributes. This way you can easily extend your data structures later on without breaking compatibility, etc. -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpSNFb09iVlx.pgp Description: PGP signature
[PATCH 2/3] [NETFILTER] ip6tables: remove unused definitions
[NETFILTER] ip6tables: remove unused definitions These definitions ware used for only internal use in kernel = 2.6.13, which had not introduced the unified parser of IPv6 extension header yet. Signed-off-by: Yasuyuki Kozakai [EMAIL PROTECTED] Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit de63a83ad960be726fd487ce8f8a7115ecd015d5 tree 6499aea0dd46054b92a4393b4365d8b87b602cd1 parent 396728c2753f13781973532b4074bfdb398ad675 author Yasuyuki Kozakai [EMAIL PROTECTED] Mon, 16 Jan 2006 17:46:49 +0100 committer Harald Welte [EMAIL PROTECTED] Mon, 16 Jan 2006 17:46:49 +0100 include/linux/netfilter_ipv6/ip6t_ah.h |9 - include/linux/netfilter_ipv6/ip6t_esp.h |9 - include/linux/netfilter_ipv6/ip6t_frag.h |9 - include/linux/netfilter_ipv6/ip6t_opts.h |9 - include/linux/netfilter_ipv6/ip6t_rt.h |9 - 5 files changed, 0 insertions(+), 45 deletions(-) diff --git a/include/linux/netfilter_ipv6/ip6t_ah.h b/include/linux/netfilter_ipv6/ip6t_ah.h index c4f0793..8531879 100644 --- a/include/linux/netfilter_ipv6/ip6t_ah.h +++ b/include/linux/netfilter_ipv6/ip6t_ah.h @@ -18,13 +18,4 @@ struct ip6t_ah #define IP6T_AH_INV_LEN0x02/* Invert the sense of length. */ #define IP6T_AH_INV_MASK 0x03/* All possible flags. */ -#define MASK_HOPOPTS128 -#define MASK_DSTOPTS64 -#define MASK_ROUTING32 -#define MASK_FRAGMENT 16 -#define MASK_AH 8 -#define MASK_ESP4 -#define MASK_NONE 2 -#define MASK_PROTO 1 - #endif /*_IP6T_AH_H*/ diff --git a/include/linux/netfilter_ipv6/ip6t_esp.h b/include/linux/netfilter_ipv6/ip6t_esp.h index 01142b9..a91b6ab 100644 --- a/include/linux/netfilter_ipv6/ip6t_esp.h +++ b/include/linux/netfilter_ipv6/ip6t_esp.h @@ -7,15 +7,6 @@ struct ip6t_esp u_int8_t invflags; /* Inverse flags */ }; -#define MASK_HOPOPTS128 -#define MASK_DSTOPTS64 -#define MASK_ROUTING32 -#define MASK_FRAGMENT 16 -#define MASK_AH 8 -#define MASK_ESP4 -#define MASK_NONE 2 -#define MASK_PROTO 1 - /* Values for invflags field in struct ip6t_esp. */ #define IP6T_ESP_INV_SPI 0x01/* Invert the sense of spi. */ #define IP6T_ESP_INV_MASK 0x01/* All possible flags. */ diff --git a/include/linux/netfilter_ipv6/ip6t_frag.h b/include/linux/netfilter_ipv6/ip6t_frag.h index 449a57e..66070a0 100644 --- a/include/linux/netfilter_ipv6/ip6t_frag.h +++ b/include/linux/netfilter_ipv6/ip6t_frag.h @@ -21,13 +21,4 @@ struct ip6t_frag #define IP6T_FRAG_INV_LEN 0x02/* Invert the sense of length. */ #define IP6T_FRAG_INV_MASK 0x03/* All possible flags. */ -#define MASK_HOPOPTS128 -#define MASK_DSTOPTS64 -#define MASK_ROUTING32 -#define MASK_FRAGMENT 16 -#define MASK_AH 8 -#define MASK_ESP4 -#define MASK_NONE 2 -#define MASK_PROTO 1 - #endif /*_IP6T_FRAG_H*/ diff --git a/include/linux/netfilter_ipv6/ip6t_opts.h b/include/linux/netfilter_ipv6/ip6t_opts.h index e259b62..a07e363 100644 --- a/include/linux/netfilter_ipv6/ip6t_opts.h +++ b/include/linux/netfilter_ipv6/ip6t_opts.h @@ -20,13 +20,4 @@ struct ip6t_opts #define IP6T_OPTS_INV_LEN 0x01/* Invert the sense of length. */ #define IP6T_OPTS_INV_MASK 0x01/* All possible flags. */ -#define MASK_HOPOPTS128 -#define MASK_DSTOPTS64 -#define MASK_ROUTING32 -#define MASK_FRAGMENT 16 -#define MASK_AH 8 -#define MASK_ESP4 -#define MASK_NONE 2 -#define MASK_PROTO 1 - #endif /*_IP6T_OPTS_H*/ diff --git a/include/linux/netfilter_ipv6/ip6t_rt.h b/include/linux/netfilter_ipv6/ip6t_rt.h index f1070fb..5215602 100644 --- a/include/linux/netfilter_ipv6/ip6t_rt.h +++ b/include/linux/netfilter_ipv6/ip6t_rt.h @@ -30,13 +30,4 @@ struct ip6t_rt #define IP6T_RT_INV_LEN0x04/* Invert the sense of length. */ #define IP6T_RT_INV_MASK 0x07/* All possible flags. */ -#define MASK_HOPOPTS128 -#define MASK_DSTOPTS64 -#define MASK_ROUTING32 -#define MASK_FRAGMENT 16 -#define MASK_AH 8 -#define MASK_ESP4 -#define MASK_NONE 2 -#define MASK_PROTO 1 - #endif /*_IP6T_RT_H*/ -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] [NETFILTER] ip6tables: whitespace and indent cosmetic cleanup
[NETFILTER] ip6tables: whitespace and indent cosmetic cleanup Signed-off-by: Yasuyuki Kozakai [EMAIL PROTECTED] Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit 663dafd271c08d6cf69ecaa1daf34380276b9945 tree 6194991675051520a186f40f587c1a55827a5081 parent de63a83ad960be726fd487ce8f8a7115ecd015d5 author Yasuyuki Kozakai [EMAIL PROTECTED] Mon, 16 Jan 2006 17:47:39 +0100 committer Harald Welte [EMAIL PROTECTED] Mon, 16 Jan 2006 17:47:39 +0100 net/ipv6/netfilter/ip6t_dst.c| 147 net/ipv6/netfilter/ip6t_eui64.c | 64 +- net/ipv6/netfilter/ip6t_frag.c | 153 - net/ipv6/netfilter/ip6t_hbh.c| 147 net/ipv6/netfilter/ip6t_ipv6header.c | 79 ++--- net/ipv6/netfilter/ip6t_owner.c | 28 ++--- net/ipv6/netfilter/ip6t_rt.c | 211 ++ 7 files changed, 417 insertions(+), 412 deletions(-) diff --git a/net/ipv6/netfilter/ip6t_dst.c b/net/ipv6/netfilter/ip6t_dst.c index 80fe826..b4c153a 100644 --- a/net/ipv6/netfilter/ip6t_dst.c +++ b/net/ipv6/netfilter/ip6t_dst.c @@ -36,19 +36,19 @@ MODULE_AUTHOR(Andras Kis-Szabo [EMAIL PROTECTED] #endif /* - * (Type 0xC0) 6 - * 0 - ignorable - * 1 - must drop the packet - * 2 - send ICMP PARM PROB regardless and drop packet - * 3 - Send ICMP if not a multicast address and drop packet + * (Type 0xC0) 6 + * 0 - ignorable + * 1 - must drop the packet + * 2 - send ICMP PARM PROB regardless and drop packet + * 3 - Send ICMP if not a multicast address and drop packet * (Type 0x20) 5 - * 0 - invariant - * 1 - can change the routing + * 0 - invariant + * 1 - can change the routing * (Type 0x1F) Type - * 0 - Pad1 (only 1 byte!) - * 1 - PadN LENGTH info (total length = length + 2) - * C0 | 2 - JUMBO 4 x x x x ( 64k ) - * 5 - RTALERT 2 x x + * 0 - Pad1 (only 1 byte!) + * 1 - PadN LENGTH info (total length = length + 2) + * C0 | 2 - JUMBO 4 x x x x ( 64k ) + * 5 - RTALERT 2 x x */ static int @@ -60,16 +60,16 @@ match(const struct sk_buff *skb, unsigned int protoff, int *hotdrop) { - struct ipv6_opt_hdr _optsh, *oh; - const struct ip6t_opts *optinfo = matchinfo; - unsigned int temp; - unsigned int ptr; - unsigned int hdrlen = 0; - unsigned int ret = 0; - u8 _opttype, *tp = NULL; - u8 _optlen, *lp = NULL; - unsigned int optlen; - + struct ipv6_opt_hdr _optsh, *oh; + const struct ip6t_opts *optinfo = matchinfo; + unsigned int temp; + unsigned int ptr; + unsigned int hdrlen = 0; + unsigned int ret = 0; + u8 _opttype, *tp = NULL; + u8 _optlen, *lp = NULL; + unsigned int optlen; + #if HOPBYHOP if (ipv6_find_hdr(skb, ptr, NEXTHDR_HOP, NULL) 0) #else @@ -77,42 +77,41 @@ match(const struct sk_buff *skb, #endif return 0; - oh = skb_header_pointer(skb, ptr, sizeof(_optsh), _optsh); - if (oh == NULL){ - *hotdrop = 1; - return 0; - } - - hdrlen = ipv6_optlen(oh); - if (skb-len - ptr hdrlen){ - /* Packet smaller than it's length field */ - return 0; - } - - DEBUGP(IPv6 OPTS LEN %u %u , hdrlen, oh-hdrlen); - - DEBUGP(len %02X %04X %02X , - optinfo-hdrlen, hdrlen, - (!(optinfo-flags IP6T_OPTS_LEN) || - ((optinfo-hdrlen == hdrlen) ^ - !!(optinfo-invflags IP6T_OPTS_INV_LEN; - - ret = (oh != NULL) - - (!(optinfo-flags IP6T_OPTS_LEN) || - ((optinfo-hdrlen == hdrlen) ^ - !!(optinfo-invflags IP6T_OPTS_INV_LEN))); - - ptr += 2; - hdrlen -= 2; - if ( !(optinfo-flags IP6T_OPTS_OPTS) ){ - return ret; + oh = skb_header_pointer(skb, ptr, sizeof(_optsh), _optsh); + if (oh == NULL) { + *hotdrop = 1; + return 0; + } + + hdrlen = ipv6_optlen(oh); + if (skb-len - ptr hdrlen) { + /* Packet smaller than it's length field */ + return 0; + } + + DEBUGP(IPv6 OPTS LEN %u %u , hdrlen, oh-hdrlen); + + DEBUGP(len %02X %04X %02X , + optinfo-hdrlen, hdrlen, + (!(optinfo-flags IP6T_OPTS_LEN) || + ((optinfo-hdrlen == hdrlen) ^ +!!(optinfo-invflags IP6T_OPTS_INV_LEN; + + ret = (oh != NULL) + (!(optinfo-flags IP6T_OPTS_LEN) || + ((optinfo-hdrlen == hdrlen) ^ + !!(optinfo-invflags IP6T_OPTS_INV_LEN))); + + ptr += 2; + hdrlen
[PATCH 1/3] [NETFILTER] Makefile cleanup
[NETFILTER] Makefile cleanup These are replaced with x_tables matches and no longer exist. Signed-off-by: Yasuyuki Kozakai [EMAIL PROTECTED] Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit 396728c2753f13781973532b4074bfdb398ad675 tree cefbd947576c9ec4511153242109a6566079ebdf parent 78dbb647a1b797fcda30a65557ca21b3b73ee097 author Yasuyuki Kozakai [EMAIL PROTECTED] Mon, 16 Jan 2006 17:45:12 +0100 committer Harald Welte [EMAIL PROTECTED] Mon, 16 Jan 2006 17:45:12 +0100 net/ipv4/netfilter/Makefile |1 - net/ipv6/netfilter/Makefile |1 - 2 files changed, 0 insertions(+), 2 deletions(-) diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index bcefe64..e5c5b32 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -46,7 +46,6 @@ obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o # matches -obj-$(CONFIG_IP_NF_MATCH_HELPER) += ipt_helper.o obj-$(CONFIG_IP_NF_MATCH_HASHLIMIT) += ipt_hashlimit.o obj-$(CONFIG_IP_NF_MATCH_IPRANGE) += ipt_iprange.o obj-$(CONFIG_IP_NF_MATCH_MULTIPORT) += ipt_multiport.o diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index 663b474..db6073c 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -4,7 +4,6 @@ # Link order matters here. obj-$(CONFIG_IP6_NF_IPTABLES) += ip6_tables.o -obj-$(CONFIG_IP6_NF_MATCH_LENGTH) += ip6t_length.o obj-$(CONFIG_IP6_NF_MATCH_RT) += ip6t_rt.o obj-$(CONFIG_IP6_NF_MATCH_OPTS) += ip6t_hbh.o ip6t_dst.o obj-$(CONFIG_IP6_NF_MATCH_IPV6HEADER) += ip6t_ipv6header.o -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fix ip[6]t_policy compiler warnings after x_tables merge
Hi Dave! Please apply, thanks. [NETFILTER] ip[6]t_policy: Fix compilation warnings ip[6]t_policy argument conversion slipped when merging with x_tables Signed-off-by: Benoit Boissinot [EMAIL PROTECTED] Signed-off-by: Harald Welte [EMAIL PROTECTED] --- a/net/ipv4/netfilter/ipt_policy.c 2006-01-14 16:11:52.0 +0100 +++ b/net/ipv4/netfilter/ipt_policy.c 2006-01-14 18:55:26.0 +0100 @@ -95,7 +95,10 @@ static int match(const struct sk_buff *skb, const struct net_device *in, const struct net_device *out, - const void *matchinfo, int offset, int *hotdrop) + const void *matchinfo, + int offset, + unsigned int protoff, + int *hotdrop) { const struct ipt_policy_info *info = matchinfo; int ret; @@ -113,7 +116,7 @@ return ret; } -static int checkentry(const char *tablename, const struct ipt_ip *ip, +static int checkentry(const char *tablename, const void *ip_void, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { --- a/net/ipv6/netfilter/ip6t_policy.c 2006-01-14 16:11:52.0 +0100 +++ b/net/ipv6/netfilter/ip6t_policy.c 2006-01-14 18:52:58.0 +0100 @@ -118,7 +118,7 @@ return ret; } -static int checkentry(const char *tablename, const struct ip6t_ip6 *ip, +static int checkentry(const char *tablename, const void *ip_void, void *matchinfo, unsigned int matchsize, unsigned int hook_mask) { -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpdPv8g0AvFU.pgp Description: PGP signature
Re: Resubmit: [PATCH] natsemi: NAPI support
On Wed, Jan 04, 2006 at 07:32:49AM +, Mark Brown wrote: This patch against 2.6.14 converts the natsemi driver to use NAPI. It was originally based on one written by Harald Welte, though it has since been modified quite a bit, most extensively in order to remove the ability to disable NAPI since none of the other drivers seem to provide that functionality any more. Mark, sorry for not responding earlier to your emails with regard to your natsemi patch. Thanks for pushing this persistently. From reviewing your patch, I personally thin it's fine to be merged. I will test it on my natsemi based boxes later today. -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpXjBH2ZtIut.pgp Description: PGP signature
Re: Integrating IXP4xx NPE support in kernel (with IXP4xx accesslibrary)
On Tue, Dec 27, 2005 at 09:22:31AM -0500, jamal wrote: On Tue, 2005-27-12 at 15:05 +0100, Lennert Buytenhek wrote: On Tue, Dec 27, 2005 at 08:49:50AM -0500, jamal wrote: A question for you: Does the 4XX allow you to rewrite the microcode? As far as I know, there are no docs for this, at least not any publicly available ones. Chapter 4 in the IXP42X Developer's Manual is titled Network Processor Engines (NPE), but that's just a 2-page marketing blurb on how great their stuff is. If you ask Intel for docs, they just tell you to go look at the available (evil-licensed) source code. They probably will never open it up. which is quite surprising, given the fact that you get full documentation (even as printed books) for the ixp2xxx microengines. Yes, all the actual example microcode is evil-licensed, but as lennert has shown it is possible to develop gpl licensed microengine code using their docs. why can't they just do the same for ixp4xx? -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpmx85Sbq20y.pgp Description: PGP signature
Re: 2.6.15-rc2-mm1
On Wed, Nov 23, 2005 at 11:22:18AM -0800, Andrew Morton wrote: Michal Piotrowski [EMAIL PROTECTED] wrote: Debug: sleeping function called from invalid context at include/asm/semaphore.h:123 in_atomic():1, irqs_disabled():0 [c0103be6] dump_stack+0x17/0x19 [c011a0c3] __might_sleep+0x9c/0xae [fd9a090d] translate_table+0x147/0xc14 [ip_tables] [fd9a2b2a] ipt_register_table+0x93/0x20d [ip_tables] [f98fe027] init+0x27/0x9e [iptable_filter] [c01376d0] sys_init_module+0xd7/0x26c [c0102cc7] sysenter_past_esp+0x54/0x75 --- | preempt count: 0001 ] | 1 level deep critical section nesting: .. [fd9a2aca] ipt_register_table+0x33/0x20d [ip_tables] .[f98fe027] .. ( = init+0x27/0x9e [iptable_filter]) ipt_register_table() does get_cpu() then calls translate_table(), and somewhere under translate_table() we do something which sleeps, only I'm not sure what it is - netfilter likes to hide things in unexpected places. I'll investigate this. the get_cpu() scheme was introduced as a fix for a different (less serious) problem. You'll get a reply until later today. -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgprVtGQHXCIr.pgp Description: PGP signature
Re: Fw: [Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]
On Wed, Nov 23, 2005 at 02:44:19PM -0800, David S. Miller wrote: Please make sure this gets looked at, and at least reviewed. Jozsef Kadlecsik doesn't recall those patches/changes (even though he's our Mr. TCP state tracking and is indicated as the author of one of the two patches. I also don't recall having seen any of those patches before. But that doesn't mean all too much, my brain is like a sieve some times. Jozsef is most familiar with that code, he said he'll cook up a new patch and do nfsim testing over the next days. Also, it was indicated that those fixes did go mainline at some point? Can soembody please indicate when that was supposed to happen? I don't really recall any of this, and I would be surprised if we deliberately backed out such changes at a later point. Thanks! -- - Harald Welte [EMAIL PROTECTED] http://netfilter.org/ Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed.-- Paul Vixie pgpQbFIBHhzlb.pgp Description: PGP signature