LPRng: Tutorial and hints on timeouts

papowell Thu, 24 Aug 2000 15:00:39 -0700
One of the joyful things that we have to deal with is the problem
of misbehaving printers.  The main approach that I took was to assume
that we would use 'error codes' from filters and 'timeouts' to handle
these problems.  While this information is available in the LPRng and
ifhp HOWTOs,  I think it might be a good idea to put this in one place.
I may incorporate this into the LPRng HOWTO as well.

If you want to see what entries control this, look at the
timeout and interval flags in the lpd.conf file.  Note that all
of the timing values are in seconds.

The Really Important Stuff

# Purpose: connection control for remote printers
#   default connect_interval=10  (INTEGER)
# Purpose: connection timeout for remote printers
#   default connect_timeout=10  (INTEGER)
# Purpose: interval at which to check OF filter for error status
#   default filter_poll_interval=30  (INTEGER)
# Purpose: connection control for remote network printers
#   default network_connect_grace=0  (INTEGER)
# Purpose: maximum connection interval
#   default max_connect_interval=60  (INTEGER)
# Purpose: timeout for read/write lpr IO operatons
#   default send_job_rw_timeout=6000  (INTEGER)
# Purpose: timeout for read/write status or control operatons
#   default send_query_rw_timeout=30  (INTEGER)
# Purpose: stalled job timeout
#   default stalled_time=120  (INTEGER)

Network connection stuff:

# Purpose: retry on ECONNREFUSED error
#   default retry_econnrefused  (FLAG on)
# Purpose: retry making connection even when link is down
#   default retry_nolink  (FLAG on)
# Purpose: orginate connections from these ports
#   default originate_port=512 1023  (STRING)
# Purpose: set the SO_LINGER socket option
#   default socket_linger=10  (INTEGER)


Job Transmission Failure:
# Purpose: numbers of times to try sending job - 0 is infinite
#   default send_try=3  (INTEGER)
# Purpose: failure action to take after send_try attempts failed
#   default send_failure_action=remove  (STRING)


When LPRng sends a job to a printer,  it usually does the following
operations:

A) Opens a connection

   First there is a pause of 'network_grace' before it tries to
   open a connection.  This accomodates some broken printer
   server boxes that require a few seconds between connections.

   Then we try to open/connect to the printer.
   We set up a timeout of 'connect_timeout' seconds and then
   do the open/connect call.  If the call succeeds within
   the timeout we move on to the next step.

   If the open/connect call fails or times out and we have the
   'retry_nolink' flag set to TRUE (default)  then we wait
      min( max_connect_interval, connect_interval*(2**(attempt-1)))
   i.e. -we do an exponential doubling of the 'connect_interval'
   time,  and a maximum of max_connect_interval.  Note that the
   defaults are 10 seconds between attempts with a maximum wait
   time of 60 seconds.  This has proven to be satisfactory in
   almost all situations.  Note that this loop usually occurs when
   the printer is offline.

   Why not always retry making a connection indefinately?  Well,
   the answer is that this is the default behaviour.  But there
   are a couple of folks who want to use 'failover'.  That is,
   when you cannot make a connection to one printer,  you make
   a connection to another.  So we do not always wait forever.
   (Didn't think about that, did you?)

   Now I must warn you that the network connect code is a horror
   story dealing with broken and defective TCP/IP stacks, timeout
   issues, and other details that you would never consider.  Some
   of this code would give the well know SF Writer Dean L. Koontz
   a Nightmare Journey (Yes, I have a copy of this,  and no, I will
   not sell it to him so he can shred it.  :-)).

   First,  when making a connection you either make it from a
   'reserved' port (1-1023) or from a non-reserved port (1024-65534).
   If there was a connection previously from the same port that
   you are last used, or from a port you last used within a timeout
   period used by the TCP/IP stack (usually 2 minutes - 10 minutes)
   then the remote system will refuse the connection.  Now this is
   not a 'hard' refusal, but a 'soft' one - if you try to connect
   from another port then most likely the connection will succeed.
   It is interesting to note that if you are doing this from a
   'non-priviledged' port that the OS usually takes care to make
   sure that the port has not be reused within its timeout, and
   usually assigns you the next 'unused' port by default.  This
   means that each time you try to make a connection you will get
   port (say) 5000, 5001, .... 65535, 1024, ... in sequence.  This
   almost always is reasonable.  However,  if you need to bind to
   a privileged port then you are in trouble,  as you may find that
   you wrap around rather quickly.  So you need to specify the port
   range that you want to connect from.  This is done by the
   'originate_port' option.  It usually is set to '512-1023' which
   is pretty robust.

   Now we come to some mindboggling stuff.  It turns out that
   sometimes on some OS's if you are making connections to a lot
   of printers then you run out of ports to use.  You deal with
   this by setting a magic flag during connection time that says
   'reuse this port immediately'.  So now you can let the remote
   system handle the problems of accepting connections, etc.
   Hopefully the TCP/IP stack on the server is not broken and this
   actually works.

   The retry_econnrefused controls the 'try to connect from the
   allowed ports' operation when set.  Again,  you might, under
   various circumstances,  and for slightly bent TCP/IP stacks,
   NOT want to do round robin attempts.  For if you do,  then ALL
   of these ports will be marked with a timeout AND nobody else
   will be able to reuse them until the 2 minute timeout is over.
   (I warned you about this).

b) If the connection attempt fails,  and fails really hard,
   or if there is a problem sending the job after the connection
   has been made then job is resent for 'send_try' attempts.  If
   this if the job still has not been sent the 'send_failure_action'
   will be done (usually the job is removed).

   If you want infinite retries, then you can set 'send_try' to 0.

c) If there is an OF filter, the OF filter is started and
   lpd sends the banner page/initialization to the printer
   via the OF filter.

   If there is not an OF filter, sends the banner/initialization
   directly to the printer.

   It waits 'send_job_rw_timeout' seconds for this operation to
   succeed.  Why?  Ummm... because sometimes your printer goes by-by
   and hangs up.  If you resend the job,  it will work fine.  But
   you usually need a timeout.  In fact,  all the 'write data'
   operations are controlled by this timeout.

   If you do not want to have a timeout for this operation,
   you can set send_job_rw_timeout to 0.

d) You now start the job filter,  if it has one, and wait for
   it to complete within 'send_job_rw_timeout' seconds.
   If there is no filter,  then you copy the file to the printer,
   waiting 'send_job_rw_timeout' seconds for completion.

   Now if you have a filter,  the filter may be smart enough
   to handle the timeouts.  If this is the case,  then you
   may want to delegate this to the filter by setting
   'send_job_rw_timeout'.

   This will now dump the problem on the filter,  which can
   use an entirely different set of timeouts and approaches to
   the problem.

e) After sending the job,  you close the connection.  Now
   you run into an interesting and rather ugly problem with
   certain TCP/IP stacks and operating systems.
   If you close the connection and there is data that has not
   been transferred to the printer sitting in the host's TCP/IP
   buffers, then it will THROW THE DATA AWAY AND CLOSE THE CONNECTION.
   You can get around this by setting the 'socket_linger' time
   to be pretty large.  If you do not set this value then the
   OS default is used, which is usually 10 seconds.

   What is the implication of this?  Suppose you have a printcap
   entry such as the following,  which simply tosses jobs at a
   print spooler box,  which has a parallel port or serial port
   connection:

   lp:
     :lp=10.0.0.1%9100
     :sd=/var/spool/lpd/%P

   You discover that 'small jobs' work fine,  but the large jobs
   have only the first couple of pages printed.  This is,  as I
   explained is due to the effects of the close.  You really want
   to do a 'shutdown(sock,1)' and then wait for the printer to
   close the connection... but this fails to work on most of the
   print spoolers that I have tried this with.  So you have to set
   the 'linger' value to something large,  such as
   send_job_rw_timeout=600   (5 minutes).


Patrick Powell                 Astart Technologies,
[EMAIL PROTECTED]            9475 Chesapeake Drive, Suite D,
Network and System             San Diego, CA 92123
  Consulting                   858-874-6543 FAX 858-279-8424 
LPRng - Print Spooler (http://www.astart.com)

-----------------------------------------------------------------------------
If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body.  For the impatient,
to subscribe to a list with name LIST,  send mail to [EMAIL PROTECTED]
with:                           | example:
subscribe LIST <mailaddr>       |  subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr>     |  unsubscribe lprng [EMAIL PROTECTED]

If you have major problems,  send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------
LPRng: Tutorial and hints on timeouts

Reply via email to