Hi all,
This is in connection with fairly frequent runaway lpd processes that
we're seeing using LPRng-3.8.27 under RH9 - by "fairly frequent", I
mean 5 or more a week on 5 print servers, serving about 60 printers
between them. A description of the problem is attached at the end of
this message from my previous mail to the list. It's definitely not
connected with futexes, as I orginally thought, as I see the same
symptoms when running without futexes.
I have turned on debugging to level 3 on a queue to try and get some
information, this is the end part of the log for the most recent one
I've seen this problem on:
2004-08-24-17:19:56.177 cessnock [14080] (Server) at3: cleanup: done,
exit(0)
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3:
Update_spool_info: file 'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3:
Get_spool_control: file 'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: Get_file_image:
'control.pr', maxsize 0
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: Checkread: file
'control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: Checkread:
'control.pr' fd 5, size 73
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: Get_fd_image: fd
5
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: Get_fd_image:
len 73 'debug=3
printing_aborted=0x0
pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: ***
Dump_subserver_info: 'Do_queue_jobs - after setup' - 1 subservers
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: server 0 -
0x80c4498, count 7, max 102, list 0x80c6c80
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 0] 0x80c5e80
='debug=3'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 1] 0x80c3fb8
='printer=at3'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 2] 0x80c5eb0
='printing_aborted=0x0'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 3] 0x80c3f78
='printing_disabled=0x0'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 4] 0x80c73b8
='queue_control_file=control.pr'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 5] 0x80c7660
='spooldir=/var/spool/lpd/at3'
2004-08-24-17:21:13.051 cessnock [14089] (Server) at3: [ 6] 0x80c76b8
='spooling_disabled=0x0'
.. at this point nothing more gets logged to the log file. Other
runaway processes show the same.
I've also debugged the runaway process with gdb a couple of times and
it's been stuck at the same point each time, namely after the call to
'opendir (".")' in Scan_queue in getqueue.c (line 82). Here's the
backtrace from gdb:
(gdb) bt
#0 0x4023697b in malloc_consolidate () from /lib/i686/libc.so.6
#1 0x40236007 in _int_malloc () from /lib/i686/libc.so.6
#2 0x40235a34 in calloc () from /lib/i686/libc.so.6
#3 0x4026bdc7 in opendir () from /lib/i686/libc.so.6
#4 0x0804fa53 in Scan_queue (spool_control=0x80bc640,
sort_order=0x80bc0b4,
pprintable=0xbfff97bc, pheld=0xbfff97c0, pmove=0xbfff97c4,
only_queue_process=1, perr=0xbfff97c8, pdone=0xbfff97cc,
remove_prefix=0x0, remove_suffix=0x0) at common/getqueue.c:82
#5 0x0806280b in Do_queue_jobs (name=0x11 <Address 0x11 out of bounds>,
subserver=0) at common/lpd_jobs.c:566
#6 0x080714dc in Receive_secure (sock=0xbfffa330,
input=0x808ab58 "U\211�WVS\203�\024\213}$\213u �E�")
at common/lpd_secure.c:247
#7 0x080617d5 in Service_lpd (talk=-1,
from_addr=0xbfffa360 "129.215.45.134 port 43234")
at common/lpd_dispatch.c:341
#8 0x080614d0 in Service_connection (args=0xbfffa360)
at common/lpd_dispatch.c:310
#9 0x0805d8c5 in Do_work (name=0x809fd68 "server", args=0xbfffa490)
at common/linelist.c:3847
#10 0x0805d676 in Make_lpd_call (name=0x809fd68 "server",
passfd=0xbfffa4a0,
args=0xbfffa490) at common/linelist.c:3820
#11 0x0805d9e7 in Start_worker (name=0x809fd68 "server",
parms=0xbfffa500,
fd=14) at common/linelist.c:3876
#12 0x0804d3a7 in Accept_connection (sock=8, lpd_socket=0,
unix_socket=0)
at common/lpd.c:1008
#13 0x0804be3a in main (argc=1, argv=0xbfffa7b4, envp=0x1) at
common/lpd.c:687
#14 0x401d7a67 in __libc_start_main () from /lib/i686/libc.so.6
(gdb)
But I'm a bit stuck as to what I can do next to try and work out what
is happening to these proceses to make them run away. I can't debug
into the system calls. Where is "." when the opendir is called?
I'd very much appreciate any advice on what I can do next - is it a
RH9 problem or an LPRng problem? What can I try next, etc?
Cheers
Toby Blake
University of Edinburgh
> I'm running LPRng-3.8.27.
>
> However, I'm still seeing runaway lpd processes - it's always the
> 'server' process and it consumes as much CPU as it can - an lpc kill
> fixes the problem, but obviously this impacts on the overall
> reliability of the printing system - it's happened twice in the last
> day or so.
>
> I was wondering if anyone else has seen this problem at all. Here's
> an example of it happening:
>
> [kant]toby: lpq -Pat8
> Printer: [EMAIL PROTECTED] 'HP Laserjet 8150DN in 5.05 (Level 5 West Lab)
AT'
> Queue: 7 printable jobs
> Server: pid 8178 active
> Status: job '[EMAIL PROTECTED]' removed at 16:02:46.028
> Rank Owner/ID Pr/Class Job Files Size
Time
> 1 [EMAIL PROTECTED] A 372 print.ps 224344
16:04:36
> [kant]toby:
>
> .. with the 8178 process chewing up all CPU, until an lpc kill kills
> this process and gets the queue moving again. Note that strace
> doesn't reveal anything at all - not a single line of output. I have
> enabled debugging on this queue, so will hopefully get some
> information if/when I get the next runaway.
--
-----------------------------------------------------------------------------
YOU MUST BE A LIST MEMBER IN ORDER TO POST TO THE LPRng MAILING LIST
The address you post from or your Reply-To address MUST be your
subscription address
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body.
To subscribe to a list with name LIST, send mail to [EMAIL PROTECTED]
with: | example:
subscribe LIST <mailaddr> | subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr> | unsubscribe lprng [EMAIL PROTECTED]
If you have major problems, call Patrick Powell or one of the friendly
staff at Astart Technologies for help. Astart also does support for LPRng.
Also, check the Web Page at: http://www.lprng.com for any announcements.
Astart Technologies (LPRng - Print Spooler http://www.lprng.com)
6741 Convoy Court
San Diego, CA 92111
858-874-6543 FAX 858-751-2435
-----------------------------------------------------------------------------