Thanks! Some of these look interesting. Let me go through
them and make some suggestions.
> Date: Fri, 16 Jun 2000 10:19:36 +0200
> From: Gunther Schlegel <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: Re: Memory leak
>
> Hi,
>
> > I am extremely interested in this.
> > Can you reliably cause this leak?
>
> The lpd process gets bigger all the time until I restart it. It is definitely
> there. It may be related to my "crude" setup. I'll explain the setup and to
> common problems to you.
>
> We are developing software and hosting servers for airfreight forwarding
> companies. The applications are character based, so all the employees just
> telnet into the server over a small bandwith wide area network. The network is
> used for remote printing as well, the client uses DLINK and HP external
> printservers.
>
> I use the lpd protocol for the DLINK devices and the HP protocol for the HP
> printservers.
>
> Freight forwarders tend to print a lot, and while labels, loading lists,
> invoices and airwaybills are short documents there are also guys creating 1MB
> documents of pure text. They call it cost control and the jobs jam the
> printing system. At least Sun's one. ;)
>
> Printing is crucial to the freight forwarders, and the system must not loose
> any job. I modified the standard lprng config file to reflect the slow
> environment and to prevent the system kills jobs on its own. However, some
> jobs get stalled, I do not know why, it tends to be the bigger ones on DLINK
> attached printers. A stalled job results in a jammed queue, and lprng does not
> recognize this or cancel the job or reactivate it.
Right. I know the problem and the fix, but it is rather
difficult to handle well. When you close a socket or connection
the system tries to send the data that is in its local buffers
out and then does the 'HALF-CLOSE' part of the TCP/IP protocol.
It then waits for the other direction of the connection to close.
Once this has been done, then the TCP/IP stack marks the connection
as finished.
The problem is how the user can invoke and monitor this process.
There are several different ways that this happens, dependent on
Manufacturer and version of TCP/IP stack.
When you do a close() on a socket this activity is triggered.
On some systems, you IMMEDIATELY return from the close
by default.
On some systems, you return from the close when then 'HALF-CLOSE'
has been acknowleged.
On some systems you can set an option, the SO_LINGER socket option:
Lets see what BSDi has:
SO_LINGER controls the action taken when unsent messages are queued on
socket and a close(2) is performed. If the socket promises reliable de-
livery of data and SO_LINGER is set, the system will block the process on
the close attempt until it is able to transmit the data or until it de-
cides it is unable to deliver the information (a timeout period, termed
the linger interval, is specified in seconds in the setsockopt() call
when SO_LINGER is requested). If SO_LINGER is disabled and a close is
issued, the system will process the close in a manner that allows the
process to continue as quickly as possible.
As we see, this will block until the 'HALF-CLOSE' acknowldegement
has been recieved.
But what about status? The printer at the other end may still be
sending status. You really need to wait until it closes the connection.
If you do a close you cannot monitor status. You can try
using the 'shutdown' system call - this will cause the TCP/IP connection
to be HALF-CLOSED (by the way, you return IMMEDIATELY from this call, unlike
when you do a close and have SO_LINGER set). Then you should wait for
the other end of the TCP/IP stack to be closed.
Well, I have bad news, worse news, and HORRIBLE news:
various TCP/IP stacks and applications that use them are broken,
but broken in different manners.
On some printers if an PostScript or PCL job is still
printing OR the system expects more input, then for some reason
it (the part of the system reading the input file)
sees the 'END OF FILE' on input but does not do a simplar
connection close operation after it finishes printing. Instead,
it keeps fliniging ACK packets back at the host until it gets
a RST and then closes the connection.
I might add that these printers DO see a connection OPEN and reset themselves
on each new connection, so it is not totally disastrous.
Some printers which are on parallel ports of JetDirect boxes see
neither the connection open or close - and the JetDirect box will
ignore the the 'HalfClose' and not close its connection. This
has been fixed on the later releases of the JetDirect software,
and does not appear to be present on the ones that have the HTTP
configuration capability - probably because they discovered this
when they implemented the HTTP server (:-)
DLINK boxes have several different versions - get the latest firmware
and update - you may have this problem go away.
At one time I had added a flag to force LPRng to CLOSE the connection
(close_on_eoj????) rather then do a 'shutdown' on the connection,
but I seem to have taken it out, having resolved the problem
in another way.
>
> I put the following line in root's cron the automatically reactivate hung
> jobs:
> 02,07,12,17,22,27,32,37,42,47,52,57 * * * * /usr/local/sbin/lpc redo all all >
> /dev/null
I like this idea. You can also look for
jobs that have been stalled. You need to write a small
Perl script that looks for 'Printer' and then 'stalled()'
and then does a lpc kill on this queue.
use FileHandle;
$status = new FileHandle( "lpq -a |" );
$printer = "";
while( <> ){
if( /^Printer/ ){
$printer = split(/\s+/,$_)[2];
} elseif( /^stalled/ ){
@errmsg = ` lpc kill $printer 2>&1`;
print "restarted $printer - @errmsg\n"
}
}
>
> We do not have a problem with missing jobs since that time.
>
> The machines environment:
> Sun Enterprise 450 Server running Solaris 7, lprng 3.6.14 compiled using gcc
> 2.95.2.
pretty standard.
> #uname -a
> SunOS tom 5.7 Generic_106541-08 sun4u sparc SUNW,Ultra-4
>
> I compiled LPRng that way:
>
> ./configure --with-printcap_path=/etc/printcap
> gmake clean
> gmake all
> gmake install
>
> That is the resulting file:
> [root@tom /var/adm]# ll /usr/local/sbin/lpd
> -rwsr-xr-x 1 root other 395644 Jun 8 15:35 /usr/local/sbin/lpd
>
> Should be pretty much standard.
>
> Initially the LPD process is about 2MB in size, and it gains an additional 3-5
> MB per day. When it reaches about 100MB it normally cannot fork anymore.
>
> I am attaching the files /etc/printcap, /var/adm/messages (because it has some
> maybe interesting log entries from the lpd process and/usr/local/etc/lpd.conf.
>
> If there is any way to gather more information important to you just let me
> know.
>
> best regards, Gunther
>
> --
> Gunther Schlegel Riege Software International GmbH
> System Administration Divison Manager Mollsfeld 10
> 40670 Meerbusch, Germany
> Email: [EMAIL PROTECTED] Phone: +49-2159-9148-0
> Fax: +49-2159-9148-11
> -------------------------------------------------------------------------
> --------------105D1A54BAF82232FD7FB40F
> Content-Type: text/plain; charset=us-ascii;
> name="messages.filtered"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="messages.filtered"
>
> Jun 14 08:17:01 tom Waiting[28969]: LPD: fork failed! LPD not accepting any requests
> Jun 14 08:27:00 tom Waiting[28969]: LPD: fork failed! LPD not accepting any requests
> Jun 14 08:42:00 tom Waiting[28969]: LPD: fork failed! LPD not accepting any requests
> Jun 14 08:47:01 tom Waiting[28969]: LPD: fork failed! LPD not accepting any requests
MMMM.... This message usually comes up when there are too many processes.
>
> --------------105D1A54BAF82232FD7FB40F
> Content-Type: text/plain; charset=us-ascii;
> name="lpd.conf"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline;
> filename="lpd.conf"
>
> # lpd.conf generated from Makefile on Sat Feb 12 16:20:42 MET 2000
>
> bk
> exit_linger_timeout=300
> force_localhost@
> lpd_force_poll
> network_connect_grace=1
> reuse_addr
> save_on_error
> send_failure_action=hold
> send_job_rw_timeout=600
> stalled_time=300
Looks normal...
I will try to see if I can get the same effects.
Patrick
-----------------------------------------------------------------------------
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body. For the impatient,
to subscribe to a list with name LIST, send mail to [EMAIL PROTECTED]
with: | example:
subscribe LIST <mailaddr> | subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr> | unsubscribe lprng [EMAIL PROTECTED]
If you have major problems, send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body. For the impatient,
to subscribe to a list with name LIST, send mail to [EMAIL PROTECTED]
with: | example:
subscribe LIST <mailaddr> | subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr> | unsubscribe lprng [EMAIL PROTECTED]
If you have major problems, send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------