Re: [PATCH net] bpf: split eBPF out of NET
On Mon, Oct 27, 2014 at 4:10 PM, David Miller wrote: > From: Alexei Starovoitov > Date: Thu, 23 Oct 2014 18:41:08 -0700 > >> introduce two configs: >> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters >> depend on >> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use >> >> that solves several problems: >> - tracing and others that wish to use eBPF don't need to depend on NET. >> They can use BPF_SYSCALL to allow loading from userspace or select BPF >> to use it directly from kernel in NET-less configs. >> - in 3.18 programs cannot be attached to events yet, so don't force it on >> - when the rest of eBPF infra is there in 3.19+, it's still useful to >> switch it off to minimize kernel size >> >> Signed-off-by: Alexei Starovoitov >> --- >> >> bloat-o-meter on x64 shows: >> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) >> >> tested with many different config combinations. Hopefully didn't miss >> anything. > > Applied with two changes: > > 1) boolean --> bool > 2) Moved bloat-o-meter and testing information into commit message. > > Thanks. Thank you for taking care of it! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
From: Alexei Starovoitov Date: Thu, 23 Oct 2014 18:41:08 -0700 > introduce two configs: > - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters > depend on > - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use > > that solves several problems: > - tracing and others that wish to use eBPF don't need to depend on NET. > They can use BPF_SYSCALL to allow loading from userspace or select BPF > to use it directly from kernel in NET-less configs. > - in 3.18 programs cannot be attached to events yet, so don't force it on > - when the rest of eBPF infra is there in 3.19+, it's still useful to > switch it off to minimize kernel size > > Signed-off-by: Alexei Starovoitov > --- > > bloat-o-meter on x64 shows: > add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) > > tested with many different config combinations. Hopefully didn't miss > anything. Applied with two changes: 1) boolean --> bool 2) Moved bloat-o-meter and testing information into commit message. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
From: Alexei Starovoitov a...@plumgrid.com Date: Thu, 23 Oct 2014 18:41:08 -0700 introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Applied with two changes: 1) boolean -- bool 2) Moved bloat-o-meter and testing information into commit message. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Mon, Oct 27, 2014 at 4:10 PM, David Miller da...@davemloft.net wrote: From: Alexei Starovoitov a...@plumgrid.com Date: Thu, 23 Oct 2014 18:41:08 -0700 introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Applied with two changes: 1) boolean -- bool 2) Moved bloat-o-meter and testing information into commit message. Thanks. Thank you for taking care of it! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On 10/24/2014 10:11 AM, Josh Triplett wrote: On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote: On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett wrote: On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. Thanks :) I've sent it to Dave and marked it as 'net', so it's for his net tree. I don't mind if he decides to steer it into net-next when it opens, since changing Kconfig is always tricky. I just felt that this patch deserves to be in 'net' and in 3.18-rc Ah, nice; yes, getting it into 3.18-rc would be excellent if possible. Fully agreed, BPF_SYSCALL defaulting to 'n' _for the time being_ would also give an option for reducing exposure until the API is further stabilized and in a ready-to-use state. bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. I don't think that's necessary. eBPF is in early stages of adoption. More things to come, so bloat-o-meter stats will be obsolete very quickly. I don't mean the full list of symbols, just the summary saying this saves 15k. It might probably help to more easily identify from the log which commits are related to a tinyfication perspective. Perhaps Dave can still squash that into the commit log. +# interpreter that classic socket filters depend on +config BPF + boolean s/boolean/bool/ Is there a difference? I thought it's an alias. It's an alias, but almost everything uses "bool": ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l 7064 ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l 94 Actually, shouldn't we get rid of the alias then? Same accounts for def_bool and def_boolean ... it would help to avoid confusion to just have a single term for each. Anyway, the rest looks good to me, thanks. I am totally fine of having it under EXPERT for now for the reasons mentioned above. This can still be lifted later on. Acked-by: Daniel Borkmann -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Fri, Oct 24, 2014 at 10:11 AM, Josh Triplett wrote: >> >> +config BPF_SYSCALL >> >> + bool "Enable bpf() system call" if EXPERT >> >> + select ANON_INODES >> >> + select BPF >> >> + default n >> >> + help >> >> + Enable the bpf() system call that allows to manipulate eBPF >> >> + programs and maps via file descriptors. >> > >> > Not sure this one goes under EXPERT, especially since it currently has >> > "default n". >> >> I followed the same style as EPOLL, EVENTFD and others >> in the same category. > > I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file. Those indeed look like better examples. With if EXPERT and default n, you need to enable EXPERT before you can enable the syscall, which is probably not what you want. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote: > On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett wrote: > > On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: > >> introduce two configs: > >> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters > >> depend on > >> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use > >> > >> that solves several problems: > >> - tracing and others that wish to use eBPF don't need to depend on NET. > >> They can use BPF_SYSCALL to allow loading from userspace or select BPF > >> to use it directly from kernel in NET-less configs. > >> - in 3.18 programs cannot be attached to events yet, so don't force it on > >> - when the rest of eBPF infra is there in 3.19+, it's still useful to > >> switch it off to minimize kernel size > >> > >> Signed-off-by: Alexei Starovoitov > > > > Thanks for working on this! A few nits below, but otherwise this looks > > good to me. Once this gets appropriate reviews from net and bpf folks, > > please let me know if you want this to go through the net tree, the tiny > > tree, or some other tree. > > Thanks :) > I've sent it to Dave and marked it as 'net', so it's for > his net tree. I don't mind if he decides to steer it into net-next > when it opens, since changing Kconfig is always tricky. > I just felt that this patch deserves to be in 'net' and in 3.18-rc Ah, nice; yes, getting it into 3.18-rc would be excellent if possible. > >> bloat-o-meter on x64 shows: > >> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) > > > > Very nice! Please do include the bloat-o-meter stats in the commit > > message. > > I don't think that's necessary. eBPF is in early stages of adoption. > More things to come, so bloat-o-meter stats will be obsolete > very quickly. I don't mean the full list of symbols, just the summary saying this saves 15k. > >> +# interpreter that classic socket filters depend on > >> +config BPF > >> + boolean > > > > s/boolean/bool/ > > Is there a difference? I thought it's an alias. It's an alias, but almost everything uses "bool": ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l 7064 ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l 94 > >> +config BPF_SYSCALL > >> + bool "Enable bpf() system call" if EXPERT > >> + select ANON_INODES > >> + select BPF > >> + default n > >> + help > >> + Enable the bpf() system call that allows to manipulate eBPF > >> + programs and maps via file descriptors. > > > > Not sure this one goes under EXPERT, especially since it currently has > > "default n". > > I followed the same style as EPOLL, EVENTFD and others > in the same category. I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file. > >> +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call > >> + * skb_copy_bits(), so provide a weak definition of it for NET-less > >> config. > >> + */ > >> +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, > >> + int len) > >> +{ > >> + return -EFAULT; > >> +} > > > > Please discuss this in the commit message. What are the implications of > > ending up with this implementation that always returns -EFAULT? > > because that's what real skb_copy_bits() would return. > In this case it's actually irrelevant, since non-socket programs > are not allowed to have LD_ABS/LD_IND instructions and > I'm only resolving linker error here. > But returning negative error helps prevent bugs in cases > where verifier or some in-kernel generated program uses > LD_ABS by mistake. Makes sense. > I don't think these type of explanations are necessary in > commit logs. > > >> @@ -6,7 +6,7 @@ menuconfig NET > >> bool "Networking support" > >> select NLATTR > >> select GENERIC_NET_UTILS > >> - select ANON_INODES > >> + select BPF > > > > Why does this not need to select ANON_INODES anymore? Did *only* BPF > > use that, so it only needs to occur via BPF_SYSCALL? If so, can you > > document that in the commit message? > > I hope that folks who were following this work on netdev > remember commit 38b3629adb8c04 that added it. > So here I'm actually removing this ANON_INODES dependency > from NET and moving it into BPF_SYSCALL where it belongs. Thanks for the clarification. > btw, the goal of this patch is not tinification, but rather being > good citizen and not forcing new syscall on everyone. A critical part of the tinification effort is not having the kernel get gratuitously bigger in other areas while we're trying to shrink it. So, I really appreciate your work. :) - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote: On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett j...@joshtriplett.org wrote: On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. Thanks :) I've sent it to Dave and marked it as 'net', so it's for his net tree. I don't mind if he decides to steer it into net-next when it opens, since changing Kconfig is always tricky. I just felt that this patch deserves to be in 'net' and in 3.18-rc Ah, nice; yes, getting it into 3.18-rc would be excellent if possible. bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. I don't think that's necessary. eBPF is in early stages of adoption. More things to come, so bloat-o-meter stats will be obsolete very quickly. I don't mean the full list of symbols, just the summary saying this saves 15k. +# interpreter that classic socket filters depend on +config BPF + boolean s/boolean/bool/ Is there a difference? I thought it's an alias. It's an alias, but almost everything uses bool: ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l 7064 ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l 94 +config BPF_SYSCALL + bool Enable bpf() system call if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. Not sure this one goes under EXPERT, especially since it currently has default n. I followed the same style as EPOLL, EVENTFD and others in the same category. I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file. +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call + * skb_copy_bits(), so provide a weak definition of it for NET-less config. + */ +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, + int len) +{ + return -EFAULT; +} Please discuss this in the commit message. What are the implications of ending up with this implementation that always returns -EFAULT? because that's what real skb_copy_bits() would return. In this case it's actually irrelevant, since non-socket programs are not allowed to have LD_ABS/LD_IND instructions and I'm only resolving linker error here. But returning negative error helps prevent bugs in cases where verifier or some in-kernel generated program uses LD_ABS by mistake. Makes sense. I don't think these type of explanations are necessary in commit logs. @@ -6,7 +6,7 @@ menuconfig NET bool Networking support select NLATTR select GENERIC_NET_UTILS - select ANON_INODES + select BPF Why does this not need to select ANON_INODES anymore? Did *only* BPF use that, so it only needs to occur via BPF_SYSCALL? If so, can you document that in the commit message? I hope that folks who were following this work on netdev remember commit 38b3629adb8c04 that added it. So here I'm actually removing this ANON_INODES dependency from NET and moving it into BPF_SYSCALL where it belongs. Thanks for the clarification. btw, the goal of this patch is not tinification, but rather being good citizen and not forcing new syscall on everyone. A critical part of the tinification effort is not having the kernel get gratuitously bigger in other areas while we're trying to shrink it. So, I really appreciate your work. :) - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Fri, Oct 24, 2014 at 10:11 AM, Josh Triplett j...@joshtriplett.org wrote: +config BPF_SYSCALL + bool Enable bpf() system call if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. Not sure this one goes under EXPERT, especially since it currently has default n. I followed the same style as EPOLL, EVENTFD and others in the same category. I was thinking of CROSS_MEMORY_ATTACH and FHANDLE in the same file. Those indeed look like better examples. With if EXPERT and default n, you need to enable EXPERT before you can enable the syscall, which is probably not what you want. Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On 10/24/2014 10:11 AM, Josh Triplett wrote: On Thu, Oct 23, 2014 at 10:32:50PM -0700, Alexei Starovoitov wrote: On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett j...@joshtriplett.org wrote: On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. Thanks :) I've sent it to Dave and marked it as 'net', so it's for his net tree. I don't mind if he decides to steer it into net-next when it opens, since changing Kconfig is always tricky. I just felt that this patch deserves to be in 'net' and in 3.18-rc Ah, nice; yes, getting it into 3.18-rc would be excellent if possible. Fully agreed, BPF_SYSCALL defaulting to 'n' _for the time being_ would also give an option for reducing exposure until the API is further stabilized and in a ready-to-use state. bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. I don't think that's necessary. eBPF is in early stages of adoption. More things to come, so bloat-o-meter stats will be obsolete very quickly. I don't mean the full list of symbols, just the summary saying this saves 15k. It might probably help to more easily identify from the log which commits are related to a tinyfication perspective. Perhaps Dave can still squash that into the commit log. +# interpreter that classic socket filters depend on +config BPF + boolean s/boolean/bool/ Is there a difference? I thought it's an alias. It's an alias, but almost everything uses bool: ~/src/linux$ git grep -w bool -- '*Kconfig*' | wc -l 7064 ~/src/linux$ git grep -w boolean -- '*Kconfig*' | wc -l 94 Actually, shouldn't we get rid of the alias then? Same accounts for def_bool and def_boolean ... it would help to avoid confusion to just have a single term for each. Anyway, the rest looks good to me, thanks. I am totally fine of having it under EXPERT for now for the reasons mentioned above. This can still be lifted later on. Acked-by: Daniel Borkmann dbork...@redhat.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett wrote: > On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: >> introduce two configs: >> - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters >> depend on >> - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use >> >> that solves several problems: >> - tracing and others that wish to use eBPF don't need to depend on NET. >> They can use BPF_SYSCALL to allow loading from userspace or select BPF >> to use it directly from kernel in NET-less configs. >> - in 3.18 programs cannot be attached to events yet, so don't force it on >> - when the rest of eBPF infra is there in 3.19+, it's still useful to >> switch it off to minimize kernel size >> >> Signed-off-by: Alexei Starovoitov > > Thanks for working on this! A few nits below, but otherwise this looks > good to me. Once this gets appropriate reviews from net and bpf folks, > please let me know if you want this to go through the net tree, the tiny > tree, or some other tree. Thanks :) I've sent it to Dave and marked it as 'net', so it's for his net tree. I don't mind if he decides to steer it into net-next when it opens, since changing Kconfig is always tricky. I just felt that this patch deserves to be in 'net' and in 3.18-rc >> bloat-o-meter on x64 shows: >> add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) > > Very nice! Please do include the bloat-o-meter stats in the commit > message. I don't think that's necessary. eBPF is in early stages of adoption. More things to come, so bloat-o-meter stats will be obsolete very quickly. >> +# interpreter that classic socket filters depend on >> +config BPF >> + boolean > > s/boolean/bool/ Is there a difference? I thought it's an alias. >> +config BPF_SYSCALL >> + bool "Enable bpf() system call" if EXPERT >> + select ANON_INODES >> + select BPF >> + default n >> + help >> + Enable the bpf() system call that allows to manipulate eBPF >> + programs and maps via file descriptors. > > Not sure this one goes under EXPERT, especially since it currently has > "default n". I followed the same style as EPOLL, EVENTFD and others in the same category. >> +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call >> + * skb_copy_bits(), so provide a weak definition of it for NET-less config. >> + */ >> +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, >> + int len) >> +{ >> + return -EFAULT; >> +} > > Please discuss this in the commit message. What are the implications of > ending up with this implementation that always returns -EFAULT? because that's what real skb_copy_bits() would return. In this case it's actually irrelevant, since non-socket programs are not allowed to have LD_ABS/LD_IND instructions and I'm only resolving linker error here. But returning negative error helps prevent bugs in cases where verifier or some in-kernel generated program uses LD_ABS by mistake. I don't think these type of explanations are necessary in commit logs. >> @@ -6,7 +6,7 @@ menuconfig NET >> bool "Networking support" >> select NLATTR >> select GENERIC_NET_UTILS >> - select ANON_INODES >> + select BPF > > Why does this not need to select ANON_INODES anymore? Did *only* BPF > use that, so it only needs to occur via BPF_SYSCALL? If so, can you > document that in the commit message? I hope that folks who were following this work on netdev remember commit 38b3629adb8c04 that added it. So here I'm actually removing this ANON_INODES dependency from NET and moving it into BPF_SYSCALL where it belongs. btw, the goal of this patch is not tinification, but rather being good citizen and not forcing new syscall on everyone. It was tested with upcoming tracing patches that select BPF instead of NET. It will also help parallelize the development, since my old predicate-tree into eBPF optimization for vanilla tracing filters: http://lwn.net/Articles/598545/ can potentially go into tip tree a release earlier. Back then full NET dependency was a show stopper. This patch finally addresses it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: > introduce two configs: > - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters > depend on > - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use > > that solves several problems: > - tracing and others that wish to use eBPF don't need to depend on NET. > They can use BPF_SYSCALL to allow loading from userspace or select BPF > to use it directly from kernel in NET-less configs. > - in 3.18 programs cannot be attached to events yet, so don't force it on > - when the rest of eBPF infra is there in 3.19+, it's still useful to > switch it off to minimize kernel size > > Signed-off-by: Alexei Starovoitov Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. > > bloat-o-meter on x64 shows: > add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. > tested with many different config combinations. Hopefully didn't miss > anything. > > init/Kconfig| 14 ++ > kernel/Makefile |2 +- > kernel/bpf/Makefile |6 +++--- > kernel/bpf/core.c |9 + > net/Kconfig |2 +- > 5 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 3ee28ae..340fd92 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1341,6 +1341,10 @@ config SYSCTL_ARCH_UNALIGN_ALLOW > config HAVE_PCSPKR_PLATFORM > bool > > +# interpreter that classic socket filters depend on > +config BPF > + boolean s/boolean/bool/ > + > menuconfig EXPERT > bool "Configure standard kernel features (expert users)" > # Unhide debug options, to make the on-by-default options visible > @@ -1521,6 +1525,16 @@ config EVENTFD > > If unsure, say Y. > > +# syscall, maps, verifier > +config BPF_SYSCALL > + bool "Enable bpf() system call" if EXPERT > + select ANON_INODES > + select BPF > + default n > + help > + Enable the bpf() system call that allows to manipulate eBPF > + programs and maps via file descriptors. Not sure this one goes under EXPERT, especially since it currently has "default n". > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -655,3 +655,12 @@ void bpf_prog_free(struct bpf_prog *fp) > schedule_work(>work); > } > EXPORT_SYMBOL_GPL(bpf_prog_free); > + > +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call > + * skb_copy_bits(), so provide a weak definition of it for NET-less config. > + */ > +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, > + int len) > +{ > + return -EFAULT; > +} Please discuss this in the commit message. What are the implications of ending up with this implementation that always returns -EFAULT? > diff --git a/net/Kconfig b/net/Kconfig > index 6272420..99815b5 100644 > --- a/net/Kconfig > +++ b/net/Kconfig > @@ -6,7 +6,7 @@ menuconfig NET > bool "Networking support" > select NLATTR > select GENERIC_NET_UTILS > - select ANON_INODES > + select BPF Why does this not need to select ANON_INODES anymore? Did *only* BPF use that, so it only needs to occur via BPF_SYSCALL? If so, can you document that in the commit message? > ---help--- > Unless you really know what you are doing, you should say Y here. > The reason is that some programs need kernel networking support even > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net] bpf: split eBPF out of NET
introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov --- bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. init/Kconfig| 14 ++ kernel/Makefile |2 +- kernel/bpf/Makefile |6 +++--- kernel/bpf/core.c |9 + net/Kconfig |2 +- 5 files changed, 28 insertions(+), 5 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 3ee28ae..340fd92 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1341,6 +1341,10 @@ config SYSCTL_ARCH_UNALIGN_ALLOW config HAVE_PCSPKR_PLATFORM bool +# interpreter that classic socket filters depend on +config BPF + boolean + menuconfig EXPERT bool "Configure standard kernel features (expert users)" # Unhide debug options, to make the on-by-default options visible @@ -1521,6 +1525,16 @@ config EVENTFD If unsure, say Y. +# syscall, maps, verifier +config BPF_SYSCALL + bool "Enable bpf() system call" if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. + config SHMEM bool "Use full shmem filesystem" if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index dc5c775..17ea6d4 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -86,7 +86,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o -obj-$(CONFIG_NET) += bpf/ +obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 4542723..0daf7f6 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -1,5 +1,5 @@ -obj-y := core.o syscall.o verifier.o - +obj-y := core.o +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o ifdef CONFIG_TEST_BPF -obj-y += test_stub.o +obj-$(CONFIG_BPF_SYSCALL) += test_stub.o endif diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index f0c30c5..d6594e4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -655,3 +655,12 @@ void bpf_prog_free(struct bpf_prog *fp) schedule_work(>work); } EXPORT_SYMBOL_GPL(bpf_prog_free); + +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call + * skb_copy_bits(), so provide a weak definition of it for NET-less config. + */ +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, +int len) +{ + return -EFAULT; +} diff --git a/net/Kconfig b/net/Kconfig index 6272420..99815b5 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -6,7 +6,7 @@ menuconfig NET bool "Networking support" select NLATTR select GENERIC_NET_UTILS - select ANON_INODES + select BPF ---help--- Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH net] bpf: split eBPF out of NET
introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. init/Kconfig| 14 ++ kernel/Makefile |2 +- kernel/bpf/Makefile |6 +++--- kernel/bpf/core.c |9 + net/Kconfig |2 +- 5 files changed, 28 insertions(+), 5 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 3ee28ae..340fd92 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1341,6 +1341,10 @@ config SYSCTL_ARCH_UNALIGN_ALLOW config HAVE_PCSPKR_PLATFORM bool +# interpreter that classic socket filters depend on +config BPF + boolean + menuconfig EXPERT bool Configure standard kernel features (expert users) # Unhide debug options, to make the on-by-default options visible @@ -1521,6 +1525,16 @@ config EVENTFD If unsure, say Y. +# syscall, maps, verifier +config BPF_SYSCALL + bool Enable bpf() system call if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. + config SHMEM bool Use full shmem filesystem if EXPERT default y diff --git a/kernel/Makefile b/kernel/Makefile index dc5c775..17ea6d4 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -86,7 +86,7 @@ obj-$(CONFIG_RING_BUFFER) += trace/ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o -obj-$(CONFIG_NET) += bpf/ +obj-$(CONFIG_BPF) += bpf/ obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 4542723..0daf7f6 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -1,5 +1,5 @@ -obj-y := core.o syscall.o verifier.o - +obj-y := core.o +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o ifdef CONFIG_TEST_BPF -obj-y += test_stub.o +obj-$(CONFIG_BPF_SYSCALL) += test_stub.o endif diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index f0c30c5..d6594e4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -655,3 +655,12 @@ void bpf_prog_free(struct bpf_prog *fp) schedule_work(aux-work); } EXPORT_SYMBOL_GPL(bpf_prog_free); + +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call + * skb_copy_bits(), so provide a weak definition of it for NET-less config. + */ +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, +int len) +{ + return -EFAULT; +} diff --git a/net/Kconfig b/net/Kconfig index 6272420..99815b5 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -6,7 +6,7 @@ menuconfig NET bool Networking support select NLATTR select GENERIC_NET_UTILS - select ANON_INODES + select BPF ---help--- Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. tested with many different config combinations. Hopefully didn't miss anything. init/Kconfig| 14 ++ kernel/Makefile |2 +- kernel/bpf/Makefile |6 +++--- kernel/bpf/core.c |9 + net/Kconfig |2 +- 5 files changed, 28 insertions(+), 5 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 3ee28ae..340fd92 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1341,6 +1341,10 @@ config SYSCTL_ARCH_UNALIGN_ALLOW config HAVE_PCSPKR_PLATFORM bool +# interpreter that classic socket filters depend on +config BPF + boolean s/boolean/bool/ + menuconfig EXPERT bool Configure standard kernel features (expert users) # Unhide debug options, to make the on-by-default options visible @@ -1521,6 +1525,16 @@ config EVENTFD If unsure, say Y. +# syscall, maps, verifier +config BPF_SYSCALL + bool Enable bpf() system call if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. Not sure this one goes under EXPERT, especially since it currently has default n. --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -655,3 +655,12 @@ void bpf_prog_free(struct bpf_prog *fp) schedule_work(aux-work); } EXPORT_SYMBOL_GPL(bpf_prog_free); + +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call + * skb_copy_bits(), so provide a weak definition of it for NET-less config. + */ +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, + int len) +{ + return -EFAULT; +} Please discuss this in the commit message. What are the implications of ending up with this implementation that always returns -EFAULT? diff --git a/net/Kconfig b/net/Kconfig index 6272420..99815b5 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -6,7 +6,7 @@ menuconfig NET bool Networking support select NLATTR select GENERIC_NET_UTILS - select ANON_INODES + select BPF Why does this not need to select ANON_INODES anymore? Did *only* BPF use that, so it only needs to occur via BPF_SYSCALL? If so, can you document that in the commit message? ---help--- Unless you really know what you are doing, you should say Y here. The reason is that some programs need kernel networking support even -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net] bpf: split eBPF out of NET
On Thu, Oct 23, 2014 at 8:23 PM, Josh Triplett j...@joshtriplett.org wrote: On Thu, Oct 23, 2014 at 06:41:08PM -0700, Alexei Starovoitov wrote: introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size Signed-off-by: Alexei Starovoitov a...@plumgrid.com Thanks for working on this! A few nits below, but otherwise this looks good to me. Once this gets appropriate reviews from net and bpf folks, please let me know if you want this to go through the net tree, the tiny tree, or some other tree. Thanks :) I've sent it to Dave and marked it as 'net', so it's for his net tree. I don't mind if he decides to steer it into net-next when it opens, since changing Kconfig is always tricky. I just felt that this patch deserves to be in 'net' and in 3.18-rc bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) Very nice! Please do include the bloat-o-meter stats in the commit message. I don't think that's necessary. eBPF is in early stages of adoption. More things to come, so bloat-o-meter stats will be obsolete very quickly. +# interpreter that classic socket filters depend on +config BPF + boolean s/boolean/bool/ Is there a difference? I thought it's an alias. +config BPF_SYSCALL + bool Enable bpf() system call if EXPERT + select ANON_INODES + select BPF + default n + help + Enable the bpf() system call that allows to manipulate eBPF + programs and maps via file descriptors. Not sure this one goes under EXPERT, especially since it currently has default n. I followed the same style as EPOLL, EVENTFD and others in the same category. +/* To execute LD_ABS/LD_IND instructions __bpf_prog_run() may call + * skb_copy_bits(), so provide a weak definition of it for NET-less config. + */ +int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to, + int len) +{ + return -EFAULT; +} Please discuss this in the commit message. What are the implications of ending up with this implementation that always returns -EFAULT? because that's what real skb_copy_bits() would return. In this case it's actually irrelevant, since non-socket programs are not allowed to have LD_ABS/LD_IND instructions and I'm only resolving linker error here. But returning negative error helps prevent bugs in cases where verifier or some in-kernel generated program uses LD_ABS by mistake. I don't think these type of explanations are necessary in commit logs. @@ -6,7 +6,7 @@ menuconfig NET bool Networking support select NLATTR select GENERIC_NET_UTILS - select ANON_INODES + select BPF Why does this not need to select ANON_INODES anymore? Did *only* BPF use that, so it only needs to occur via BPF_SYSCALL? If so, can you document that in the commit message? I hope that folks who were following this work on netdev remember commit 38b3629adb8c04 that added it. So here I'm actually removing this ANON_INODES dependency from NET and moving it into BPF_SYSCALL where it belongs. btw, the goal of this patch is not tinification, but rather being good citizen and not forcing new syscall on everyone. It was tested with upcoming tracing patches that select BPF instead of NET. It will also help parallelize the development, since my old predicate-tree into eBPF optimization for vanilla tracing filters: http://lwn.net/Articles/598545/ can potentially go into tip tree a release earlier. Back then full NET dependency was a show stopper. This patch finally addresses it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/