hello. My suggestion is to have two windows open to the box while
you're debugging this. In the first window, run the command that blocks.
In the second, do a ps -l<pid> of the process ID of the blocking process.
That should tell you right where, inside the kernel, your process is stuck.
The field you're interested in is the wchan field. If you post that to the
list, I think you'll get a lot more insight into what's going on and it
will help you focus your efforts.
-thanks
-Brian
On Feb 22, 11:41am, Rob Newberry wrote:
} Subject: Re: problems with USB/CDC serial (umodem) - devices work with Lin
} Iâve been debugging this from as many angles as I can, and right now, the
summary question is this:
}
} Why would âread( fd, but, 1);â block for almost 30 seconds, when:
}
} - the descriptor is in non-blocking mode
} - poll indicates the descriptor is safely readable
} - top indicates there is no CPU load (machine is also
interactively responsive)
}
} Any ideas? I think my next step is to trace down through the âreadâ
system call and try to see where itâs blocking, but so far Iâve gotten a
little lost trying to figure out the path of the read call into the TTY code
(mostly because I donât know all this code, and Iâm slowly figuring out how
â/dev/ttyU0â ends up mapping to a vnode, how the v_ops get mapped, where
TTYâs buffering gets mapped in, etc. I will eventually figure this out, but
itâs a bit painful just reading and adding printfs :-).
}
}
}
} Hereâs some more details:
}
} It may indeed be unrelated to USB/umodem.
}
} The code does use select() and poll() to wait for input from the serial port.
}
} Eventually, the code decides that it can read without blocking from the port
(both select and poll have indicated that the descriptor is safely readable).
}
} It then calls "read( fd, but, 1 );â (the code is adamant about reading a
byte at a time).
}
} This call blocks for a LONG time.
}
} But the descriptor is in non-blocking mode. Iâve added debug code just
before the âreadâ call that (a) checks the flags to make sure the
descriptor is non-blocking, (b) does a âpollâ once more to make sure the
descriptor appears readable, and (c) prints out the time before the call â
and then I added a print out of the time AFTER the âread()â call completes
(using clock_gettime(CLOCK_MONOTONIC));
}
} In every case, the descriptor is in non-blocking mode, and poll says it is
readable.
}
} But the âreadâ call takes 25 seconds or more!
}
} Iâve got âtopâ running in another shell â thereâs no CPU load while
I run this.
}
} Hereâs the output:
}
} FDWrapper::read 0x7f7fffab3479, 1
} in non-blocking mode (flags = 0x00004006)
} poll indicates descriptor is readable (0x00000040)
} time before read = 8436, 316529950
} FDWrapper::read - got 1
} time after read = 8461, 854271618
}
}
} Hereâs the code that generates the above output:
}
}
} ssize_t
} FDWrapper::read(void* data, size_t len)
} {
}
} printf("FDWrapper::read %p, %u\n", data, (unsigned int)len );
}
} {
} int fl;
} fl = fcntl( mFDRead, F_GETFL );
} if ( fl & O_NONBLOCK )
} {
} printf( "\tin non-blocking mode (flags = 0x%08X)\n", fl
);
} }
} else
} {
} printf( "\tin BLOCKING mode (flags = 0x%08X)\n", fl );
} }
} }
}
} {
} struct pollfd pfd;
} int flags = POLLRDNORM|POLLERR|POLLNVAL|POLLHUP;
} int err;
}
} pfd.fd = mFDRead;
} pfd.events = flags;
} pfd.revents = 0;
}
} err = poll( &pfd, 1, 0 );
} if ( ( err > 0 ) && ( pfd.revents & flags ) )
} {
} printf( "\tpoll indicates descriptor is readable
(0x%08lX)\n", (long)(pfd.revents & flags ) );
} }
} else
} {
} printf( "\tpoll indicates descriptor is NOT readable
(err = %d, 0x%08lX)\n", err, (long)(pfd.revents & flags ) );
} }
} }
}
} {
} struct timespec start;
} clock_gettime( CLOCK_MONOTONIC, &start );
} printf( "\ttime before read = %lu, %lu\n", start.tv_sec,
start.tv_nsec );
} }
}
} ssize_t ret = ::read(mFDRead, data, len);
} printf("\tFDWrapper::read - got %d\n", (int)ret );
}
} {
} struct timespec stop;
} clock_gettime( CLOCK_MONOTONIC, &stop );
} printf( "\ttime after read = %lu, %lu\n", stop.tv_sec,
stop.tv_nsec );
} }
}
}
} â¦
}
}
}
}
}
}
}
} > On Feb 18, 2019, at 11:05 AM, Michael van Elst <[email protected]> wrote:
} >
} > [email protected] (Martin Husemann) writes:
} >
} >> On Mon, Feb 18, 2019 at 09:11:51AM -0800, Rob Newberry wrote:
} >>> I'm pretty sure there are differences in how NetBSD's USB modem class
} >>> driver interacts with the device from how other platforms do. For the
} >>> time being, I'm trying to read the code of the NetBSD and the FreeBSD
} >>> drivers, along with the USB CDC spec, to see what's going on.
} >
} >> Hmm, I thought I would be using umodem, but actually it is u3g.
} >> However, I am pretty sure that at that level the driver does not
} >> do anything special but just pass on the data in both ways.
} >
} > I'm using umodem for a 3G interface, nothing special there.
} >
} > You also see from the traces that the same data is sent and the
} > first byte received is also the same. This could be as simple
} > as a timeout on NetBSD, I first thought something was using very
} > small timeouts that cannot be handled with our low HZ value, but
} > the code seems to use only timeouts of at least several seconds.
} >
} > The code also uses a rare combination of select() and poll() to
} > wait for input. I tend to believe the problem is not related to
} > USB or umodem at all.
} >
} > --
} > --
} > Michael van Elst
} > Internet: [email protected]
} > "A potential Snark may lurk in every tree."
}
}
>-- End of excerpt from Rob Newberry