On Sat, May 24, 2014 at 12:44:40PM +0100, Thomas Graf wrote:
> On 05/23/14 at 08:47am, Ben Pfaff wrote:
> > On Fri, May 23, 2014 at 01:15:58AM +0200, Thomas Graf wrote:
> > > Signed-off-by: Thomas Graf <[email protected]>
> >
> > nl_sock_recv__() compares its 'events' member against EPOLLERR with ==,
> > but EPOLL* are bitmasks so should it use & to test for a single bit?
> > Actually I'm not sure what the 'events' argument means. The comments
> > don't appear to fully explain it, the code only interprets EPOLLERR, and
> > the only effect of EPOLLERR is to fall back to linear receive.
>
> The current implementation matches == EPOLLERR because the kernel
> currently gurantees that EPOLLERR is mutual exclusive with any
> other bit for mmap netlink on the receive side.
>
> It's very fragile though and will need a more generic implementation.
> However, how to handle EPOLLIN | EPOLLERR? It would mean that data
> is on the ring but recvmsg() needs to be invoked to retrieve the
> error condition. Do we discard the data and return the error? Do
> we ignore the error?
>
> This is partly undefined API right now and needs to be clearly
> defined.
I guess that part of the problem here may be that it's not clear to what
extent OVS actually depends on accurate error reporting. Errors don't
come up very much in Netlink, since there's no "wire" across which data
can be lost or corrupted. The most important error to OVS is ENOBUFS,
which indicates that some message was lost. OVS userspace uses ENOBUFS
in a couple of ways:
* ENOBUFS is critical for transactions on Netlink sockets
(e.g. nl_sock_transact_multiple()), because it indicates that
the reply was lost (due to lack of buffer space) and that
therefore the request must be retransmitted. If ENOBUFS is
lost, then the request will never be retransmitted and thus
the transaction will hang forever.
* dpif-linux uses ENOBUFS as an indication that packets that
"missed" in the kernel datapath could not be sent to userspace
because of a lack of buffer space. It only uses it for
logging, however, so it is not critical.
If we never use mmapped sockets for transactions, then not receiving
these errors, or not reliably receiving them, is not a huge problem. It
will only reduce the extent to which OVS can log packet floods.
Does that help at all?
> > nl_sock_recv() and nl_sock_recv_events() must be thread-safe (given
> > independent buffers) but it appears to me that nl_sock_recv_mmap() does
> > not do anything to ensure that a single frame is consumed only once; I
> > don't see any synchronization around the rx_ring pointer.
>
> Absolutely. It currently relies on a individual socket per thread.
> If that cannot be guaranteed anymore we'll require synchronization.
I don't think we guarantee that. I think that Alex and Ethan know the
design here better than me now.
If we don't guarantee this, then netlink-socket.h needs to make it
clear. On master, it says that nl_sock_recv() is essentially
thread-safe:
* - nl_sock_recv() is conditionally thread-safe: it may be called from
* different threads with the same nl_sock, but each caller must provide
* an independent receive buffer.
I think I saw an update to the comment in the series, but I don't think
the update removed the thread-safety.
> > nl_sock_recv_mmap() delegates to nl_sock_recv_linear() in the
> > NL_MMAP_STATUS_COPY case, but it treats the nl_sock_recv_linear() return
> > value as if a negative value indicated error, which it doesn't (it never
> > returns negative). Also upon error from nl_sock_recv_linear() in that
> > case one wonders whether that frame should be skipped (could we
> > otherwise end up looping forever on this frame?).
>
> Good catch. The (retval < 0) is dead code right now. I'm not sure
> that I've encountered a case where a linear receive would return
> an error. Socket errors are handled via EPOLLERR and would never
> go through nl_sock_recv_mmap().
>
> I'll need to dig through the kernel code and come up with proper
> documentation on what to do if a requested linear receive fails. We
> definitely need to skip the frame at some point but we might be
> able to retry for some error codes apart from EINTR.
Thanks for taking a look.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev