Hi,
I've upgraded lprng from 3.6.13 to 3.7.4 on a Sun box with Solaris
7.0. Several times a day we experience "hanging lpq" and at the same
time one lpd process is starting to consume a lot of cpu. The only
solution I've found is to kill the offending process.
It appears that two lpd-processes is causing a "dead lock" by setting
advisory locks on /var/spool/lpd/<bounce queue>/lpq and
/var/spool/lpd/<real queue>/lpq.0. Yes, I'm using bounce queues for
filtering.
The print server is supporting 30 printers and 600-700 clients.
Has anyone seen this behavior before?
Below is the output from top, truss, pfiles and ls.
(1) A "hanging lpq". The output from top shows that process 25884
(lpd) consumes a lot of cpu (one of two cpu:s)
# top
last pid: 27159; load averages: 1.08, 1.09, 1.11
64 processes: 61 sleeping, 1 stopped, 2 on cpu
CPU states: 32.9% idle, 17.8% user, 47.5% kernel, 1.8% iowait, 0.0%
swap
Memory: 224M real, 19M free, 32M swap in use, 523M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
25884 root 1 0 0 2316K 1648K cpu/0 70:21 47.62% lpd
27159 daemon 1 60 0 2268K 1624K sleep 0:00 0.72% ifhp
1444 root 1 58 0 1588K 1288K cpu/2 11:48 0.36%
top-sun4m-5.7
...
...
(2) Output from truss reveals that the process is trying very hard to
obtain a lock on the file "lpq".
# truss -p 25884
open("lpq", O_RDWR|O_APPEND|O_CREAT|O_NOCTTY, 0600) = 113
umask(077) = 0
fstat(113, 0xEFFFBEA0) = 0
fcntl(113, F_SETLK, 0xEFFFBD04) Err#11 EAGAIN
fcntl(113, F_SETLKW, 0xEFFFBD04) Err#45 EDEADLK
close(113) = 0
umask(0) = 077
open("lpq", O_RDWR|O_APPEND|O_CREAT|O_NOCTTY, 0600) = 113
umask(077) = 0
fstat(113, 0xEFFFBEA0) = 0
fcntl(113, F_SETLK, 0xEFFFBD04) Err#11 EAGAIN
fcntl(113, F_SETLKW, 0xEFFFBD04) Err#45 EDEADLK
close(113) = 0
umask(0) = 077
open("lpq", O_RDWR|O_APPEND|O_CREAT|O_NOCTTY, 0600) = 113
umask(077) = 0
fstat(113, 0xEFFFBEA0) = 0
fcntl(113, F_SETLK, 0xEFFFBD04) Err#11 EAGAIN
fcntl(113, F_SETLKW, 0xEFFFBD04) Err#45 EDEADLK
close(113) = 0
umask(0) = 077
open("lpq", O_RDWR|O_APPEND|O_CREAT|O_NOCTTY, 0600) = 113
umask(077) = 0
(3) Running pfiles on the process gives fstat and fcntl information
for all open files. Note, file descriptor 113 is locked by process
25885 (inode=132494).
# /usr/proc/bin/pfiles 25884
25884: /opt/fw/LPRng/current/sbin/lpd
Current rlimit: 1024 file descriptors
...
23: S_IFREG mode:0600 dev:32,136 ino:29459 uid:1 gid:12 size:68
O_RDWR|O_APPEND
24: S_IFREG mode:0600 dev:32,136 ino:67728 uid:1 gid:12 size:68
O_RDWR|O_APPEND
25: S_IFREG mode:0600 dev:32,136 ino:8847 uid:1 gid:12 size:68
O_RDWR|O_APPEND
...
111: S_IFREG mode:0600 dev:32,136 ino:53010 uid:1 gid:12 size:68
O_RDWR|O_APPEND
112: S_IFREG mode:0600 dev:32,136 ino:55955 uid:1 gid:12 size:68
O_RDWR|O_APPEND
113: S_IFREG mode:0600 dev:32,136 ino:132494 uid:1 gid:12 size:68
O_RDWR|O_APPEND
advisory write lock set by process 25885
(4) Running truss on pid 25885 shows that the process is put to
sleep while waiting for a lock on file descriptor 7.
# truss -p 25885
fcntl(7, F_SETLKW, 0xEFFF8C54) (sleeping...)
(5) Running pfiles on the same process shows that file descriptor 7 is
locked by pid 25884 (inode=67728). Hence, pid 25884 and 25885 is
causing a dead lock. One process is sleeping while the other
is looping forever.
# /usr/proc/bin/pfiles 25885
25885: /opt/fw/LPRng/current/sbin/lpd
Current rlimit: 1024 file descriptors
0: S_IFCHR mode:0666 dev:85,10 ino:91367 uid:0 gid:3 rdev:13,2
O_RDONLY
1: S_IFCHR mode:0666 dev:85,10 ino:91367 uid:0 gid:3 rdev:13,2
O_WRONLY
2: S_IFCHR mode:0666 dev:85,10 ino:91367 uid:0 gid:3 rdev:13,2
O_WRONLY
3: S_IFIFO mode:0000 dev:170,0 ino:128364 uid:1 gid:12 size:0
O_RDWR
4: S_IFSOCK mode:0666 dev:169,0 ino:24393 uid:0 gid:0 size:0
O_RDWR
5: S_IFDOOR mode:0444 dev:174,0 ino:36473 uid:0 gid:0 size:0
O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[3614]
6: S_IFREG mode:0600 dev:32,136 ino:132494 uid:1 gid:12 size:68
O_RDWR|O_APPEND
7: S_IFREG mode:0600 dev:32,136 ino:67728 uid:1 gid:12 size:68
O_RDWR|O_APPEND
advisory write lock set by process 25884
(6) The file names corresponding to inodes 132494 and 67728 are
(bounce queue)
# ls -i /var/spool/lpd/tommy_hqoh/lpq
132494 /var/spool/lpd/tommy_hqoh/lpq
(real queue)
# ls -i /var/spool/lpd/tommy_real/lpq.0
67728 /var/spool/lpd/tommy_real/lpq.0
(7) Printcap entries
tommy_real:cm=tek560
:ifhp=model=phaser,dev=tommy%9100
:lp=/dev/null
:rw
:sd=/var/spool/lpd/%P
:lf=log
:ps=status
:if=/opt/fw/LPRng/current/libexec/filters/ifhp
:server
tommy_hqoh:cm=tek560
:lp=tommy_real@localhost
:lpd_bounce
:ab
:ifhp=model=phaser,appsocket@
:if=/opt/fw/LPRng/current/libexec/filters/tek560_hq
:bp=/opt/fw/LPRng/current/libexec/filters/psbanner.tek560
:sd=/var/spool/lpd/%P
:server
(8) Misc
LPRng 3.7.4 was compiled using
# ./configure --prefix=/opt/fw/LPRng/3.7.4 --disable-gdbm
and gcc 2.95.2
-Anders
--
Anders Sundin phone: +46 920 202020
EPL/I/I fax: +46 920 202906
Ericsson Erisoft AB
-----------------------------------------------------------------------------
YOU MUST BE A LIST MEMBER IN ORDER TO POST TO THE LPRNG MAILING LIST
The address you post from MUST be your subscription address
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body. For the impatient,
to subscribe to a list with name LIST, send mail to [EMAIL PROTECTED]
with: | example:
subscribe LIST <mailaddr> | subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr> | unsubscribe lprng [EMAIL PROTECTED]
If you have major problems, send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------