Hi Andrew, > please note that the Lustre client does not write data to a disk > device, rather it sends and receives data through network,
Yes, I've heard about that. > particularly, through a socket in case of ksocklnd. A sockets isn't considered "slow" because of the speed of the network, it's considered "slow" so that the application is allowed to handle IPC with unresponsive peers. > Even if the call may not be considered slow with respect to > indefiniteness, it still is slow as compared to the "fast I/O" (local > disk I/O). I can recall several requests of adding even more > interruption points made by people who were looking for the support > of premature read/write termination (and, I believe, l_wait_event > usage in 1.8 is such that since a network request has been sent, the > read/write system call cannot be interrupted until the corresponding > lustre timeout has happened or a reply has been received) Looking at 1.8.3, I see that l_wait_event() allow calls to be interrupted under certain circumstances (if the timeout has expired, or no timeout was specified, ...). But then only if the pending signal belongs to LUSTRE_FATAL_SIGS: SIGKILL, SIGINT, SIGTERM, SIGQUIT, or SIGARLM. I guess the assumption is that all of these signals are fatal anyway, and delivering them is useful to users who change their minds about untarring the Encyclopedia Britannica, and then go complain that Lustre breaks Ctrl-C. Fine. Aside from the fact that the latter four may not be fatal, and that this may cause some unexpected breakage among unsuspecting applications that handle these signals for purposes other than process termination...whatever. I'm giving up on this point. I also noticed that the signal mask handling in l_wait_event is slightly defective. In the cases where l_wait_event would allow the call to be interrupted, it sets the caller's mask to allow LUSTRE_FATAL_SIGS. Consider the following sequence of events for a process P: P blocks SIGALRM. SIGALRM is sent to P. P calls open(). RPC to mds times-out. l_wait_info unblocks LUSTRE_FATAL_SIGS. l_wait_info determines that SIGALRM is deliverable. l_wait_info restores the signal mask. l_wait_info returns -EINTR. open() returns -EINTR. Thus open() is interrupted by the non-delivery of a blocked signal. It's easy to reproduce, if somewhat obscure. Best, John -- John L. Hammond, Ph.D. ICES, The University of Texas at Austin [email protected] (512) 471-9304 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
