One of the joyful things that we have to deal with is the problem
of misbehaving printers. The main approach that I took was to assume
that we would use 'error codes' from filters and 'timeouts' to handle
these problems. While this information is available in the LPRng and
ifhp HOWTOs, I think it might be a good idea to put this in one place.
I may incorporate this into the LPRng HOWTO as well.
If you want to see what entries control this, look at the
timeout and interval flags in the lpd.conf file. Note that all
of the timing values are in seconds.
The Really Important Stuff
# Purpose: connection control for remote printers
# default connect_interval=10 (INTEGER)
# Purpose: connection timeout for remote printers
# default connect_timeout=10 (INTEGER)
# Purpose: interval at which to check OF filter for error status
# default filter_poll_interval=30 (INTEGER)
# Purpose: connection control for remote network printers
# default network_connect_grace=0 (INTEGER)
# Purpose: maximum connection interval
# default max_connect_interval=60 (INTEGER)
# Purpose: timeout for read/write lpr IO operatons
# default send_job_rw_timeout=6000 (INTEGER)
# Purpose: timeout for read/write status or control operatons
# default send_query_rw_timeout=30 (INTEGER)
# Purpose: stalled job timeout
# default stalled_time=120 (INTEGER)
Network connection stuff:
# Purpose: retry on ECONNREFUSED error
# default retry_econnrefused (FLAG on)
# Purpose: retry making connection even when link is down
# default retry_nolink (FLAG on)
# Purpose: orginate connections from these ports
# default originate_port=512 1023 (STRING)
# Purpose: set the SO_LINGER socket option
# default socket_linger=10 (INTEGER)
Job Transmission Failure:
# Purpose: numbers of times to try sending job - 0 is infinite
# default send_try=3 (INTEGER)
# Purpose: failure action to take after send_try attempts failed
# default send_failure_action=remove (STRING)
When LPRng sends a job to a printer, it usually does the following
operations:
A) Opens a connection
First there is a pause of 'network_grace' before it tries to
open a connection. This accomodates some broken printer
server boxes that require a few seconds between connections.
Then we try to open/connect to the printer.
We set up a timeout of 'connect_timeout' seconds and then
do the open/connect call. If the call succeeds within
the timeout we move on to the next step.
If the open/connect call fails or times out and we have the
'retry_nolink' flag set to TRUE (default) then we wait
min( max_connect_interval, connect_interval*(2**(attempt-1)))
i.e. -we do an exponential doubling of the 'connect_interval'
time, and a maximum of max_connect_interval. Note that the
defaults are 10 seconds between attempts with a maximum wait
time of 60 seconds. This has proven to be satisfactory in
almost all situations. Note that this loop usually occurs when
the printer is offline.
Why not always retry making a connection indefinately? Well,
the answer is that this is the default behaviour. But there
are a couple of folks who want to use 'failover'. That is,
when you cannot make a connection to one printer, you make
a connection to another. So we do not always wait forever.
(Didn't think about that, did you?)
Now I must warn you that the network connect code is a horror
story dealing with broken and defective TCP/IP stacks, timeout
issues, and other details that you would never consider. Some
of this code would give the well know SF Writer Dean L. Koontz
a Nightmare Journey (Yes, I have a copy of this, and no, I will
not sell it to him so he can shred it. :-)).
First, when making a connection you either make it from a
'reserved' port (1-1023) or from a non-reserved port (1024-65534).
If there was a connection previously from the same port that
you are last used, or from a port you last used within a timeout
period used by the TCP/IP stack (usually 2 minutes - 10 minutes)
then the remote system will refuse the connection. Now this is
not a 'hard' refusal, but a 'soft' one - if you try to connect
from another port then most likely the connection will succeed.
It is interesting to note that if you are doing this from a
'non-priviledged' port that the OS usually takes care to make
sure that the port has not be reused within its timeout, and
usually assigns you the next 'unused' port by default. This
means that each time you try to make a connection you will get
port (say) 5000, 5001, .... 65535, 1024, ... in sequence. This
almost always is reasonable. However, if you need to bind to
a privileged port then you are in trouble, as you may find that
you wrap around rather quickly. So you need to specify the port
range that you want to connect from. This is done by the
'originate_port' option. It usually is set to '512-1023' which
is pretty robust.
Now we come to some mindboggling stuff. It turns out that
sometimes on some OS's if you are making connections to a lot
of printers then you run out of ports to use. You deal with
this by setting a magic flag during connection time that says
'reuse this port immediately'. So now you can let the remote
system handle the problems of accepting connections, etc.
Hopefully the TCP/IP stack on the server is not broken and this
actually works.
The retry_econnrefused controls the 'try to connect from the
allowed ports' operation when set. Again, you might, under
various circumstances, and for slightly bent TCP/IP stacks,
NOT want to do round robin attempts. For if you do, then ALL
of these ports will be marked with a timeout AND nobody else
will be able to reuse them until the 2 minute timeout is over.
(I warned you about this).
b) If the connection attempt fails, and fails really hard,
or if there is a problem sending the job after the connection
has been made then job is resent for 'send_try' attempts. If
this if the job still has not been sent the 'send_failure_action'
will be done (usually the job is removed).
If you want infinite retries, then you can set 'send_try' to 0.
c) If there is an OF filter, the OF filter is started and
lpd sends the banner page/initialization to the printer
via the OF filter.
If there is not an OF filter, sends the banner/initialization
directly to the printer.
It waits 'send_job_rw_timeout' seconds for this operation to
succeed. Why? Ummm... because sometimes your printer goes by-by
and hangs up. If you resend the job, it will work fine. But
you usually need a timeout. In fact, all the 'write data'
operations are controlled by this timeout.
If you do not want to have a timeout for this operation,
you can set send_job_rw_timeout to 0.
d) You now start the job filter, if it has one, and wait for
it to complete within 'send_job_rw_timeout' seconds.
If there is no filter, then you copy the file to the printer,
waiting 'send_job_rw_timeout' seconds for completion.
Now if you have a filter, the filter may be smart enough
to handle the timeouts. If this is the case, then you
may want to delegate this to the filter by setting
'send_job_rw_timeout'.
This will now dump the problem on the filter, which can
use an entirely different set of timeouts and approaches to
the problem.
e) After sending the job, you close the connection. Now
you run into an interesting and rather ugly problem with
certain TCP/IP stacks and operating systems.
If you close the connection and there is data that has not
been transferred to the printer sitting in the host's TCP/IP
buffers, then it will THROW THE DATA AWAY AND CLOSE THE CONNECTION.
You can get around this by setting the 'socket_linger' time
to be pretty large. If you do not set this value then the
OS default is used, which is usually 10 seconds.
What is the implication of this? Suppose you have a printcap
entry such as the following, which simply tosses jobs at a
print spooler box, which has a parallel port or serial port
connection:
lp:
:lp=10.0.0.1%9100
:sd=/var/spool/lpd/%P
You discover that 'small jobs' work fine, but the large jobs
have only the first couple of pages printed. This is, as I
explained is due to the effects of the close. You really want
to do a 'shutdown(sock,1)' and then wait for the printer to
close the connection... but this fails to work on most of the
print spoolers that I have tried this with. So you have to set
the 'linger' value to something large, such as
send_job_rw_timeout=600 (5 minutes).
Patrick Powell Astart Technologies,
[EMAIL PROTECTED] 9475 Chesapeake Drive, Suite D,
Network and System San Diego, CA 92123
Consulting 858-874-6543 FAX 858-279-8424
LPRng - Print Spooler (http://www.astart.com)
-----------------------------------------------------------------------------
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body. For the impatient,
to subscribe to a list with name LIST, send mail to [EMAIL PROTECTED]
with: | example:
subscribe LIST <mailaddr> | subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr> | unsubscribe lprng [EMAIL PROTECTED]
If you have major problems, send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------