On 5/29/26 16:32, Pavel Tikhomirov wrote:
On 5/29/26 15:14, Vladimir Riabchun wrote:
On 5/29/26 14:20, Pavel Tikhomirov wrote:
Expose the per-VE BPF program load limit via two ve cgroup files:
bpf_prog_max_nr - rw, writable only from ve0, restricts loads
bpf_prog_avail_nr - ro, remaining quota
Writes adjust the avail counter by the delta so that already-loaded
programs are not retroactively rejected when the cap is lowered.
https://virtuozzo.atlassian.net/browse/VSTOR-131947
Signed-off-by: Pavel Tikhomirov <[email protected]>
Feature: ve: allow BPF in Containers
---
kernel/ve/ve.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
index 48da546117bb7..9c3be61a4366a 100644
--- a/kernel/ve/ve.c
+++ b/kernel/ve/ve.c
@@ -1315,6 +1315,35 @@ static s64 ve_netif_avail_nr_read(struct
cgroup_subsys_state *css, struct cftype
return atomic_read(&css_to_ve(css)->netif_avail_nr);
}
+static u64 ve_bpf_prog_max_nr_read(struct cgroup_subsys_state *css, struct
cftype *cft)
+{
+ return css_to_ve(css)->bpf_prog_max_nr;
Read is not protected by ve->op_sem, possible race.
We only protect write against write to preserve full consistency
between bpf_prog_max_nr and bpf_prog_avail_nr pair. We are fine with eventual
consistency on read with benefit of avoiding excess locking.
It's fine unless someone tries to use this value in tests, etc.
+}
+
+static int ve_bpf_prog_max_nr_write(struct cgroup_subsys_state *css, struct
cftype *cft, u64 val)
+{
+ struct ve_struct *ve = css_to_ve(css);
+ int delta;
+
+ if (!ve_is_super(get_exec_env()))
+ return -EPERM;
+
+ if (val > INT_MAX)
+ return -EOVERFLOW;
+
+ down_write(&ve->op_sem);
+ delta = val - ve->bpf_prog_max_nr;
+ ve->bpf_prog_max_nr = val;
+ atomic_add(delta, &ve->bpf_prog_avail_nr);
We should check ve->bpf_prog_avail_nr + delta >= 0, otherwise we can have more
programs than allowed.
That is ok, we allow existing programs to take more than allowed, since they
already have it, we only prevent new programs unless avail_nr becomes positive.
Fair point.
+ up_write(&ve->op_sem);
+ return 0;
+}
+
+static s64 ve_bpf_prog_avail_nr_read(struct cgroup_subsys_state *css, struct
cftype *cft)
+{
+ return atomic_read(&css_to_ve(css)->bpf_prog_avail_nr);
+}
+
static int ve_os_release_read(struct seq_file *sf, void *v)
{
struct cgroup_subsys_state *css = seq_css(sf);
@@ -1786,6 +1815,16 @@ static struct cftype ve_cftypes[] = {
.name = "netif_avail_nr",
.read_s64 = ve_netif_avail_nr_read,
},
+ {
+ .name = "bpf_prog_max_nr",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .read_u64 = ve_bpf_prog_max_nr_read,
+ .write_u64 = ve_bpf_prog_max_nr_write,
+ },
+ {
+ .name = "bpf_prog_avail_nr",
+ .read_s64 = ve_bpf_prog_avail_nr_read,
Why signed value?
It can be negative if limit is set to less than used.
Agree.
+ },
{
.name = "os_release",
.max_write_len = __NEW_UTS_LEN + 1,
--
Best regards, Riabchun Vladimir
Linux Kernel Developer, Virtuozzo
--
Best regards, Riabchun Vladimir
Linux Kernel Developer, Virtuozzo
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel