On Fri, May 27, 2016 at 04:18:48PM +0200, David Coppa wrote:
> On Fri, 27 May 2016, Carlin Bingham wrote:
>
> > On Fri, May 27, 2016 at 01:07:09AM +0200, Theo Buehler wrote:
> > > On Thu, May 26, 2016 at 05:54:30PM -0400, Andre Smagin wrote:
> > > > On Sat, 14 May 2016 21:01:29 +0200 (CEST)
> > > > [email protected] wrote:
> > > >
> > > > > >Synopsis: radeon(4) drm crashing on current/amd64
> > > > [...]
> > > > > drm:pid77501:radeon_fence_wait_empty_locked *ERROR* error waiting for
> > > > > ring[3] to become idle (-1601868)
> > > >
> > > >
> > > > I am seeing the same issue, very infrequently (may be once every week
> > > > or two):
> > > >
> > > > drm:pid55825:radeon_fence_wait_empty_locked *ERROR* error waiting for
> > > > ring[3] to become idle (-6007676)
> > > > i3(49392): syscall 97 "inet"
> > > >
> > > > Not sure what happens to i3 as X crashes, but I get that pledge message
> > > > every time.
> > > > (Previously mentioned i3 to dcoppa, but before realizing it was related
> > > > to radeon issue.)
> > >
> > > I combed through the i3 source code hoping to get an indication what
> > > might be the cause for that socket(2) call breaking a pledge promise i3.
> > > I couldn't find anything: all socket calls are with AF_LOCAL that should
> > > be covered by the "unix" pledge.
> > >
> > > Without seeing a ktrace output, I don't think I can make any progress
> > > here.
> > >
> >
> > i3's restore_xcb_check_cb() (src/restore_layout.c), if it sees that the
> > connection to X has been lost, it calls restore_connect() which calls
> > libxcb's xcb_connect().
> >
> > In libxcb that calls xcb_connect_to_display_with_auth_info() which calls
> > _xcb_open(), which calls _xcb_open_unix() and ususally that would be it,
> > but if opening the unix socket fails (beause X has fallen over) it tries
> > again to connect by calling _xcb_open_tcp() which sets up an AF_INET
> > addrinfo and passes that to _xcb_socket()... and you can probably guess
> > what happens next.
>
> Fallback code could be removed with no (imho) dramatic consequences.
That seems to be enough to fix the present issue, but there is still
this part at the start of _xcb_open():
if ((!protocol || (strcmp("unix",protocol) != 0)) &&
(*host != '\0') && (strcmp("unix",host) != 0))
{
/* display specifies TCP */
unsigned short port = X_TCP_PORT + display;
return _xcb_open_tcp(host, protocol, port);
}
Can't this also be triggered in some circumstances in Carlin's codepath?
Unfortunately, I don't know my way around X well enough to make this
happen.
> Index: src/xcb_util.c
> ===================================================================
> RCS file: /cvs/xenocara/dist/libxcb/src/xcb_util.c,v
> retrieving revision 1.11
> diff -u -p -u -p -r1.11 xcb_util.c
> --- src/xcb_util.c 2 Feb 2016 18:42:22 -0000 1.11
> +++ src/xcb_util.c 27 May 2016 14:18:23 -0000
> @@ -297,11 +297,6 @@ static int _xcb_open(const char *host, c
> fd = _xcb_open_unix(protocol, file);
> free(file);
>
> - if (fd < 0 && !protocol && *host == '\0') {
> - unsigned short port = X_TCP_PORT + display;
> - fd = _xcb_open_tcp(host, protocol, port);
> - }
> -
> return fd;
> #endif /* !_WIN32 */
> return -1; /* if control reaches here then something has gone wrong */