Ben Pfaff <[email protected]> writes:
> On Thu, Aug 11, 2016 at 11:33:59AM -0400, Aaron Conole wrote:
>> Ben Pfaff <[email protected]> writes:
>>
>> > On Mon, Aug 01, 2016 at 08:24:40AM -0400, Aaron Conole wrote:
>> >> Aaron Conole <[email protected]> writes:
>> >>
>> >> > Currently, when using Open vSwitch with DPDK and qemu guests, the
>> >> > recommended
>> >> > method for joining the guests is via the dpdkvhostuser interface. This
>> >> > interface uses Unix Domain sockets to communicate. When these sockets
>> >> > are
>> >> > created, they inherit the permissions and ownership from the
>> >> > vswitchd process.
>> >> > This can lead to an undesirable state where the QEMU process
>> >> > cannot use the
>> >> > socket file until manual intervention is performed (via `chown`
>> >> > and/or `chmod`
>> >> > calls).
>> >> >
>> >> > This patchset gives the ability to set the permissions and
>> >> > ownership of all
>> >> > dpdkvhostuser sockets from the database, avoiding the manual
>> >> > intervention
>> >> > required to connect QEMU and OVS via DPDK.
>> >> >
>> >> > The first patch adds chmod and chown calls to lib, with unit tests. The
>> >> > second patch hooks those calls into the netdev_dpdk_vhost_user_construct
>> >> > function, after the socket is created.
>> >> >
>> >> > Changes from v2:
>> >> > * Added a new 2nd patch to series for chmod/chown on already
>> >> > opened files.
>> >> > There exist known implementations for other systems, including
>> >> > FreeBSD, but
>> >> > only linux is implemented. ENOTSUP is set when these calls fail
>> >> > on non-linux
>> >> > systems.
>> >>
>> >> Ping. I was hoping to have this applied before the 2.6 branch was cut;
>> >> is that possible?
>> >
>> > The code in patch 2, which goes through every fd in /proc to find the
>> > socket it needs because DPDK will not tell it the fd and will not set
>> > the needed permissions, is appalling. It may be the biggest kluge I've
>> > seen this year.
>>
>> Ouch. I'm sure that's an accomplishment I don't want associated with my
>> name. Do you take issue with the design (I agree it's awful to need
>> anything like this)? Did the implementation really smell that bad?
>
> It's not about the code (and it's certainly not ad hominem). It's about
> the whole technique. The idea of opening a socket, then scanning the
> whole fd table by reading through the file system to determine the
> number of the fd that was opened, is appalling.
>
>> > This would be slightly less appalling if it used fstat() and
>> > getsockname() instead of /proc. Still not great.
>>
>> I disagree - patch 2/3 wouldn't exist. [...]
>
> The approach I was talking about was to scan the fd table using a loop
> over the valid fds, calling fstat() on each to find out whether it's a
> socket then getsockname() to find out whether it's the right socket.
>
> Come to think of it, I don't know what happens if you try to
> getsockname() on a socket whose name is longer than whatever fits in
> sockaddr_un. Dammit.
>
>> [...] That goes to the last sentence
>> in this. Both of those calls will work for a descriptor (and notice
>> that patch 1/3 uses them). The issue I have is trying to go from a
>> pathname to a descriptor (which is not a standardized interface for a
>> plethora of reasons). If I had the descriptor, though, patch 2/3 would
>> never have been submitted (because it would be unneeded - I do
>> fchmod/fchown in patch 1/3).
>
> Yes, right. I understand the problem, I just don't understand why it
> exists. The DPDK maintainers can solve it instantly in 5 minutes of
> work.
>
>> > This definitely needs comments to explain why it exists and what it's
>> > doing.
>
> Thanks for the explanation. In this case, I meant that the *code* needs
> to have comments, since it had none. It took quite a while from looking
> at the code to figure out what was going on, and longer to figure out
> why. Comments should make this obvious to the reader.
>
> [...]
>
>> > This could be properly fixed in DPDK by adding a 1-line helper function
>> > that returns the fd for a vhost-user server.
>>
>> +1
>> I agree strongly. However, Christian Ehrhardt and myself have each
>> submitted patches[5][6] to try and make this kind of thing non-odius.
>> These patches have not been accepted.
>
> Neither of those is a simple "return server->listen_fd;" function. Any
> chance that would be accepted? It would solve the problem just fine.
I don't know, but I'll outline my plans, and if you agree, then I'll
put them into action.
First, I'll drop patch 2/3.
Second, I'll take your suggestion below, and implement it in the
netdev-dpdk area (as part of "patch 3/3", which will become 2/2).
I'll submit the above by mid-day Friday, tested.
Third, I'll cook up the DPDK server socket fd patch, and try to get that
part of a future DPDK release. If accepted, then when OvS ports to that
release, I will submit a patch removing your suggestion, and using the
file descriptor instead (or whatever becomes of it).
If this sounds acceptable, I'll start in on it ASAP.
>> If you think that patch 2/3 is that maloderous that it cannot be
>> accepted, then there will always probably be this kind of disjointed
>> file-system permissions gap for vhost-user server mode interfaces.
>> Unless you have a better suggestion - I'm all ears then :)
>
> I do have one suggestion. Something like this:
>
> for (int i = 0; i < 10; i++) {
> /* Find the first available file descriptor, then close it. */
> int fd = socket(...);
> close(fd);
>
> /* Create server socket. */
> server = dpdk_create_server_socket(filename, ...);
>
> /* Check that fd is the new server socket fd.
> * This is normally true unless some other thread created a fd
> * concurrently. */
> struct stat s;
> struct sockaddr_un sun;
> socklen_t sun_len = sizeof sun;
> if (!stat(fd, &s) && S_ISSOCK(s.st_mode) &&
> getsockname(fd, &sun, &sun_len) && !strcmp(filename, sun.sun_path)) {
> /* Yay, we found the fd. */
> return success;
> }
>
> /* Race. Try again. */
> dpdk_close_server_socket(server);
> }
Thanks, Ben. I am not opposed to going with this approach.
Unfortunately, there are no truly good solutions I see here at
the moment.
-Aaron
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev