On 08/09/2018 04:02 AM, Daniel Borkmann wrote:
On 08/09/2018 06:48 AM, Alexei Starovoitov wrote:
On Wed, Aug 08, 2018 at 10:08:47PM -0500, Mauricio Vasquez wrote:
And how about adding three new helpers: push/pop/peek as well?
Reusing lookup/update is neat, but does lookup == pop
or does lookup == peek ?
I suspect it will be confusing.
Three new helpers cost nothing, but will make bpf progs easier to read.
I agree. I have one doubt here, update/lookup/delete is implemented in all
map types, if the operation is not supported it returns -EINVAL.
For push/pop/peek, should we implement it in all maps or is it better to
check the map type before invoking map->ops->push/pop/seek?
(Maybe checking if map->ops->xxx is NULL will also work)
Since push/pop/peak are only for this queue/stack I thought we won't
be adding 'ops' for them and just call the helpers from progs.
But now I'm having second thoughts, since 3 new commands for syscall
feels like overkill.
At the same time I still don't feel that lookup == pop is the right alias.
Also what peak is going to alias to ?
May be let's go back to your original idea with a tweak:
push == update
peak == lookup
pop = lookup + delete
In other words in case of stack the bpf_map_lookup_elem will return
the pointer to value of top element in the stack and
bpf_map_delete_elem will delete that top element.
Then in user space we can introduce push/pop always_inline functions like:
void bpf_push(struct bpf_map *map, void *value)
{
bpf_map_update_elem(map, NULL/*key*/, value);
}
void *bpf_pop(struct bpf_map *map)
{
void * val = bpf_map_lookup_elem(map, NULL/*key*/);
bpf_map_delete_elem(map, NULL/*key*/);
return val;
}
Thoughts?
I actually think that having new push/peak/pop BPF map helpers would
be fine, as well as having them sit in map->ops. Initially I'd probably
leave out the syscall ops counterparts so they can be used only from
program.
I also think that having the new helpers is good. I don't have a strong
opinion about saving them in map->ops or not, in any case it has to be
verified that push/pop/peek are only invoked in stack/queue maps. I
think we could force in on the verifier, would it be ok?
Potentially array and per-cpu array could implement them even by having
an internal pointer to the current slot which we move on push/pop. This
would potentially also avoid all the RCU callbacks to free the elements,
elem allocations, list locking, and improve cache locality compared to
the current implementation.
I could change the implementation, a linked list is used when
BPF_F_NO_PREALLOC is passed, otherwise a plain array with internal
indexes is used.
I have a question about RCU and a possible array-based implementation.
It should be guarantee that a pointer from a pop() operation is valid
for the whole program duration.
If an eBPF program A calls pop() it will get a pointer to the head in
the array, and the head index will move, it is possible that a second
program B (running on a different CPU) pushes a new element overwriting
the element pointed by A.
I think we need to use a more sophisticated mechanism than just an array
and two indexes. Am I missing something?
Agree that existing ops are not the right alias, but deferring to user
space as inline function also doesn't really seem like a good fit, imho,
so I'd prefer rather to have something native. (Aside from that, the
above inline bpf_pop() would also race between CPUs.)
I think we should have push/pop/peek syscalls as well, having a
bpf_pop() that is race prone would create problems. Users expected maps
operations to be safe, so having one that is not will confuse them.
Thanks,
Daniel
Thanks,
Mauricio.