Summary:

Dr. Powell goes off the deep end again when he finds out that
there is yet another whacko condition for device errors.

Sigh...

And yet another version.

> From [EMAIL PROTECTED] Tue Mar 26 08:35:23 2002
> Date: Tue, 26 Mar 2002 08:35:01 -0800 (PST)
> From: Jim Trocki <[EMAIL PROTECTED]>
> To: Patrick Powell <[EMAIL PROTECTED]>
> cc: [EMAIL PROTECTED]
> Subject: ? from trockij at transmeta: ifhp of Read_status_timeout issue
>
>
> I just noticed the following during the job delivery to one of our
> HP color printers:
>
> : greenstone ~/lpd/redwood#; date; lpq -P redwood
> Tue Mar 26 08:28:07 PST 2002
> Printer: redwood@greenstone 'Bldg 2 - Copy Room, HP Color LaserJet 4500'
>  Queue: 2 printable jobs
>  Server: pid 27206 active
>  Unspooler: pid 27207 active
>  Status: waiting for subserver to exit at 07:57:41.857
>  Filter_status: (of) Rea
>  Rank   Owner/ID                  Class Job Files                 Size Time
> stalled(3046sec) rmiller@vesper+126 A   126 /tmp/AcroYUWVCx     247562 07:26:30
> 2      bilge@canasta+0              A     0 Samba job from bilg 608379 07:57:41
> done   curtisk                      A   700 Samba job from tlc   18449 19:02:15
>
> for job 126, ifhp got itself into this nasty loop:
>
> initial job type 'POSTSCRIPT' at 07:26:42.719
> decoded job type 'POSTSCRIPT' at 07:26:42.720
> job type 'POSTSCRIPT' at 07:26:42.720
> transferring 247562 bytes at 07:26:42.720
> 28 percent done at 07:26:42.771
> 57 percent done at 07:26:42.933
> 86 percent done at 07:26:43.078
> data sent at 07:26:43.165
> sent job file at 07:26:43.165
> getting end using 'pjl job/eoj' at 07:26:43.166
> no end response from printer, timeout 0 at 07:28:19.742
> (of) Process_job: OF process running at 07:28:19.743
> (of) Init_outbuf: Outbuf 0x8067278, Outmax 10240, Outlen 0 at 07:28:19.743
> (of) End_of_job: doing pjl at end at 07:28:19.743
> (of) End_of_job: 'pjl_term'='[ ustatus teoj ]' at 07:28:19.744
> (of) Read_status_timeout: timeout -1, count 0 at 07:28:19.744
> (of) Read_status_timeout: no status read, timeout -1 at 07:28:19.744
> (of) Write_read_timeout: Read_status_timeout returned 0 at 07:28:19.744
> (of) Read_status_timeout: timeout -1, count 0 at 07:28:19.744
> (of) Read_status_timeout: no status read, timeout -1 at 07:28:19.744
> (of) Write_read_timeout: Read_status_timeout returned 0 at 07:28:19.744
> (of) Read_status_timeout: timeout -1, count 0 at 07:28:19.745
> (of) Read_status_timeout: no status read, timeout -1 at 07:28:19.745
> (of) Write_read_timeout: Read_status_timeout returned 0 at 07:28:19.745
> (of) Read_status_timeout: timeout -1, count 0 at 07:28:19.745
> (of) Read_status_timeout: no status read, timeout -1 at 07:28:19.745
> (of) Write_read_timeout: Read_status_timeout returned 0 at 07:28:19.745
> (of) Read_status_timeout: timeout -1, count 0 at 07:28:19.745
>
> ...
>
> and so on until this:
>
> : greenstone ~/lpd/redwood#; ls -l status
> -rw-------   1 lp       lp       2147483647 Mar 26 08:17 status
>
> !!!
>
> that's the 2GB boundary.
>
> i took the liberty of truncating the status by doing ">status" from
> the shell, and it filled up to 2gb in under a minute. i stopped the
> madness with "lpc abort".
>
> i then connected to the jetdirect and did a "@PJL INFO ID" and it
> spit back the ustatus device stuff from the previously incomplete
> job:

OK OK OK,  it appears that under various conditions
Linux/Solaris/FreeBSD/BSDI/ etc... handle the 'broken device connection'
in different ways.  And of course, differently for different devices.
The open() man page is useless as a guide, of course:

>From the read() man page:

RETURN VALUES
     If successful, the number of bytes actually read is returned.  Upon read-
     ing end-of-file, zero is returned.  Otherwise, a -1 is returned and the
     global variable errno is set to indicate the error.

--------- Note that it does not say that when 0 is returned you have
--------- no more data to read. Right. 

1. Serial Lines  - you NEVER should get an EOF, right?
   Ummm... unless the serial line driver is configured to
   recognize ^D (CTRL D) as EOF characters,  which,
   I might add, gets echoed back when you send a ^D to the
   PostScript printer at the end of a serial line.

   So reading a ^D results in a 0 length read... Sigh...

   Solution: Set Ignore_eof flag on serial connections.
   You can also run the serial line in RAW mode,
   which is probably the most useful way.

   Note that IFHP does not have any control on this,  but
   perhaps it should try to set 'RAW' mode.  Of course,
   this will screw up the other filters... Sigh...

2. Parallel ports.

   Never has an 'end of file' condition,  so you should
   NEVER get an EOF.  Right?

    WRONG!

   There are some
   parallel port driver implementations that will return a 0
   for a read operation (and a write operation as well)
   when the device goes offline,  or more precisely,  is
   powered off or disconnected.  Why does the
   driver do this?  I dunno.  But it kinda makes
   a convoluted sense, if you consider that this is an
   'event' that will effect the output/input stuff, and that
   the connection is in half duplex AND you need to try to read
   status when this happens.

   The question is:  will you block on the next read until the
   device gets put back on line or will you get an error?  Will
   a write return 0?

   The answer is: Yes. No. It depends.  Sigh...

   (froth froth froth)

3. USB ports.

   Fairly random results writing and reading USB
   devices.  Sometimes they return 0 bytes when a read
   is pending and they are disconnected.  Sometimes they
   return 0 bytes when they are turned offline, or
   RUN OUT OF PAPER ...  (froth froth froth) as for the
   parallel port.  However, when they are powered off,
   they will sometimes hang up the entire USB bus, requiring
   heroic actions... such as unplugging the device and replugging
   it back in,  which will REALLY cause an error condition
   for reading and writing.

   Hatums USB!  Hatums! Evil! Evil.

4. TCP/IP connections.

   Read will stay pending until the connection is
   closed.  When it is closed and a read is pending,
   then you get either the number of bytes left in
   the buffer or 0 (EOF) indicating a closed connection.

   Now, of course,  if you simply turn the device OFF, then it
   umm... well, it will do nothing.  If you are WAITING for status,
   you CANNOT prod the connection and have it respond unless you can
   whine at the OS and FORCE it to send a 'keepalive' packet out.
   This requires heroic actions,  such as using the sysctl command
   and setting the keepalive interval to something like a minute.

   Now, of course, all connections will have keepalives,  the network
   will sag under a hail of short packets,  and the network gods will
   have frowny faces.

OK, so now lets come to the joys of the Read_status_timeout() routine

/*
 * int Read_status_timeout( int timeout )
 * Read status information from printer
 *
 *  if timeout == -1 then we do a quick poll
 *    we are called with this only if there is quaranteed
 *    data to read OR we will do a nonblocking read OR we will
 *    timeout REALLY quickly... 1 second.  This is ugly ugly ugly
 *    but the only way that you can make polling a parallel port
 *    under Linux work.
 *
 *  if timeout < 0  then we do not block
 *  if timeout == 0 then we block forever
 *  if timeout > 0 then block for timeout
 *
 *  Return:
 *    0   - status read
 *    -1  - error or EOF
 *    -2  - timeout
 */

Ummm... this looks reasonable, right...

we want to do a quick peek, no block for status,
we do:
  m = Read_status_timeout( -1 );
    m = 0, got status
            ... or 0 bytes read, but no error.
    m = -1  we have fatal condition, so we give up.
    we should, hopefully, never get a -1 condtion.

  By the by, we need to put the file descriptor
  into non-blocking mode when we do the read, of course.

If we have, for some whacko reason, decided that there
really is,  or should be, some status, then we need
to do a blocking read AND we need to set up a timeout.

   m = Read_status_timeout( 0 );
    no timeout
    m = 0, got status, might even get more
     status if we try again later.  But no EOF.
     Definately not an EOF condition.
     Ummm... or perhaps the &*()*(& printer was turned
     offline, and we DID NOT read any status BUT
     we did get a 0 returned from read, and it did NOT
     indicate a 'real' EOF condition.
     case we should have set Ignore_eof.  And all
     will be well.  Maybe.

    m = -1  we have fatal condition, so we give up.
    m = -2  we have fatal condition, so we give up.

This is sane,

Lets peek at the insanity we have to do to get this to work.

...
        int count = 0, err;
        char monitorbuff[SMALLBUFFER];

        monitorbuff[0] = 0;
        /* we read from stdout */
        
        err = 0;
        /* if we have timeout < 0, and are not polling then we
         * do nonblocking read.
         */
        Set_block_io( 1 );

^^^^^^^^^^^^^ make sure we have blocking ON.

        if( timeout < 0 ){
                errno = 0;
                Set_nonblock_io( 1 );
                count = read( 1, monitorbuff, sizeof(monitorbuff) - 1 );
                err = errno;
                Set_block_io( 1 );
            ^^^^^^^^^^^
               count = -1 - error or nothing there
               count =  0 - nothing there, with a very odd device
                            driver
               count > 0  - amount read
                /* if we read 0, then we have EOF */
                DEBUG2("Read_status_timeout: timeout %d, nonblocking read result %d", 
timeout, count );
                if( count == 0 ){

                OK we handle the odd device drivers here

                        if( Poll_for_status ){
                                /* handles the status case with no timeout, when we 
want REALLY
                                 * short timeouts but do not want to spin wait and eat 
CPU time
                                 * doing non-blocking reads.
                                 */
                                DEBUG2("Read_status_timeout: waiting %d msec", 
Dev_sleep);
                                plp_usleep( Dev_sleep * 1000 );
                                count = 0;
                        } else if( !Ignore_eof ){
                                count = -1;
                                DEBUG2("Read_status_timeout: EOF, timeout %d", timeout 
);
                        }
                } else if( count == -1 && ( 0
                /* if we block, then we need to try again */
#if                              defined(EWOULDBLOCK)
                                || err == EWOULDBLOCK
#endif
#if                              defined(EAGAIN)
                                || err == EAGAIN
#endif
                )){

                    ^^^^^^^^^ and here we have a 'proper'
                        nonblocking device status which is reasonable

                        count = 0;
                }

  Note that we have to massage the status when we get
  a nonblocking read with no data.


        } else {
                Set_block_io( 1 );
                Set_timeout_break( timeout );
                count = read( 1, monitorbuff, sizeof(monitorbuff) - 1 );
                err = errno;
                Clear_timeout();

            ^^^^^^^^^^^
               count = -1 - error
               count =  0 - nothing there, with a very odd device
                            driver
               count > 0  - amount read

                DEBUG2("Read_status_timeout: timeout %d, blocking read result %d, 
alarm timeout %d",
                        timeout, count, Alarm_timed_out );
                if( count == 0 ){
                        if( Poll_for_status ){
                                /* handles the status case with no timeout, when we 
want REALLY
                                 * short timeouts but do not want to spin wait and eat 
CPU time
                                 * doing non-blocking reads.
                                 */
                                DEBUG2("Read_status_timeout: waiting %d msec", 
Dev_sleep);
                                plp_usleep( Dev_sleep * 1000 );
                                count = 0;
                        } else if( !Ignore_eof ){
                                count = -1;
                                DEBUG2("Read_status_timeout: EOF, timeout %d", timeout 
);
                        }
                } else if( count < 0 && Alarm_timed_out ){
                        count = -2;
                }
        }

  The Set_timeout_break is again, the deadly problem
  of setting an alarm and then doing a system call.
  Hopefully you will never be in the situation where
  the alarm is set, expires, and you do not get to the read call
  before it does, and the read call never terminates.

  If you do,  you are REALLY screwed,  and will have to resort
  to using a hideous longjump solution.  But the longjump will also
  lose input chars...  So you better stick with this and hope that
  you never need to do this on a REALLY slow or heavily loaded system
  that is dealing with printers that do not return status.

  Sigh...

  Oh, yeah.  The plp_usleeps are there so that we do not
  hammer the system when we are dealing with those &*)(*&
  IO devices such as USB and Parallel ports that DO NOT BLOCK
  on read.  Yep. Don't.  Been there,  have the tattoos.

After looking into the depths of this stuff, is is little wonder
I have no hair left?

Patrick

-----------------------------------------------------------------------------
YOU MUST BE A LIST MEMBER IN ORDER TO POST TO THE LPRNG MAILING LIST
The address you post from MUST be your subscription address

If you need help, send email to [EMAIL PROTECTED] (or lprng-requests
or lprng-digest-requests) with the word 'help' in the body.  For the impatient,
to subscribe to a list with name LIST,  send mail to [EMAIL PROTECTED]
with:                           | example:
subscribe LIST <mailaddr>       |  subscribe lprng-digest [EMAIL PROTECTED]
unsubscribe LIST <mailaddr>     |  unsubscribe lprng [EMAIL PROTECTED]

If you have major problems,  send email to [EMAIL PROTECTED] with the word
LPRNGLIST in the SUBJECT line.
-----------------------------------------------------------------------------

Reply via email to