Linux-Development-Sys Digest #585, Volume #8     Fri, 23 Mar 01 10:13:13 EST

Contents:
  Re: How to Kill "unkillable" process (Josef Moellers)
  Re: kernel problem (Werner =?iso-8859-1?Q?K=FChnert?=)
  Re: bootsector (Rolf Magnus)
  Re: Bypassing login prompt? (jurriaan kalkman)
  Re: problem  debugging ("jbhuang")
  Re: bootsector ("Moe")
  Re: bootsector ("Moe")
  Re: How to Kill "unkillable" process (Kasper Dupont)
  Re: Wannabe -- Wrote LAN driver now want to install (Kasper Dupont)
  Re: How to Kill "unkillable" process ([EMAIL PROTECTED])
  Re: Too many open files error (Kasper Dupont)
  Re: How to Kill "unkillable" process ([EMAIL PROTECTED])
  Re: kernel panic , can someone help (Don Carroll)

----------------------------------------------------------------------------

From: Josef Moellers <[EMAIL PROTECTED]>
Subject: Re: How to Kill "unkillable" process
Date: Fri, 23 Mar 2001 11:54:29 +0100

Villy Kruse wrote:
> =

> On 23 Mar 2001 13:56:37 +1100, Nick Andrew <[EMAIL PROTECTED]> wrote:
> >Josef Moellers <[EMAIL PROTECTED]> writes:
> >
> >>unless of course the task's state is not TASK_INTERRUPTABLE:
> >>(from signal.c:)
> >>      if (t->state =3D3D=3D3D TASK_INTERRUPTIBLE && signal_pending(t)=
)
> >>              wake_up_process(t);
> >
> >>Tasks get their state set to TASK_UNINTERRUPTIBLE when they sleep_on(=
).
> >
> >For tasks in 'D' state which get a SIGKILL is it reasonable to
> >wake them up anyway? They're just going to die from the do_exit().
> >Or will that leave the kernel in an undefined state?.
> >
> =

> That is exactly it.  For what is supposed to be very short sleep in
> the kernel, the kernel device can choose to make the sleep uninterrupti=
ble
> so it would never have to clean up after catching a signal and before
> allowing the process to procede to the exit processing.  In some cases
> the clean-up required can be quite complex, and the problem is often
> better handled by a timeout at a lowere level in the kernel, which will=


Thanks, I couldn't have said it better (the opposite would be more true
B-{)

> limit the time a device driver is allowed to sleep.  However, if a
> process is uniterruptibly sleeping while a tape device is rewinding
> is probabaly too long.

That's when I open the drive door and hope none of the cog wheels gets
damaged B-{)

-- =

Josef M=F6llers (Pinguinpfleger bei FSC)
        If failure had no penalty success would not be a prize
                                                -- T.  Pratchett

------------------------------

From: Werner =?iso-8859-1?Q?K=FChnert?= <[EMAIL PROTECTED]>
Subject: Re: kernel problem
Date: Fri, 23 Mar 2001 12:11:54 +0100

I made the changs You suggested and the output changed in just one way. I now get
"Attempted to kill the idle task (11)" but no additional message from do_exit or
sys_exit. I must mention that there is no hard disk connected at the moment.

Kasper Dupont wrote:

> That is indeed very strange.
>
> At this very early stage of startup the iddle process
> is not actually yet iddle.
>
> The error message you get come from the function
> do_exit() which is called by a process when it is
> about to terminate. do_exit() verifies if the call
> comes from the iddle process which is not only
> illegal but simply a disaster.
>
> The problem is how did do_exit() get called? Most
> of the execution paths leading to do_exit() does
> call printk() on the way. printk() does work
> because otherwise panic() could not print the error
> message. So the execution path must be one of the
> few without any printk().
>
> Calling do_exit() without printing anything can
> happen through sys_exit(), but the iddle process
> should never do any system calls.
>
> It could also happen if the process received a
> signal which was not caught, but it should not be
> possible to send a signal to the iddle process.
>
> A final posibility is if the IRET instruction on
> the return path from system call/interrupt fails.
> But the values on the stack should have been setup
> correctly on entry to inerrupts.
>
> I have a couple of printk() statements you can put
> in the kernel to find out some more about what
> happens.
>
> In kernel/exit.c change the two lines:
>         if (!tsk->pid)
>                 panic("Attempted to kill the idle task!");
> into:
>         if (!tsk->pid)
>                 panic("Attempted to kill the idle task (%d)",code);
>
> And change the line:
>         do_exit((error_code&0xff)<<8);
> into:
>         printk("sys_exit(%d)\n",error_code);
>         do_exit((error_code&0xff)<<8);
>
> In arch/i386/kernel/signal.c change the line:
>       do_exit(exit_code);
> into:
>       printk("%d killed with signal %d\n",current->pid,signr);
>       do_exit(exit_code);
>
> Dipl. Ing. Werner K�hnert wrote:
> >
> > Unfortunately this is, besides the banner message, the very first message.
> > But beware that this is not on an ordinary PC. I am trying to run a 2.2.16
> > kernel on a hardware that is "almost" a PC. It has not a BIOS in the common
> > sense as a PC. I verified that the kernel (a bzImage kernel) is loaded to the
> > correct address (0x100000) and the command line passed to the kernel is
> > something like "mem=64M root=/dev/sda1 single console=ttyS0,115200n8". My
> > system does not have a grphics adpater of any kind, so I _have_ to use the
> > serial line as console. Everything works fine until the moment interrupts are
> > enabled (this seems logical to me). Thins that work are :
> > - pintk(banner)
> > - setup_arch
> > - paging_init
> > - trap_init
> > - init_IRQ
> > - sched_init
> > - time_init
> > - parse_options
> > - console_init
> > - init_modules
> > - kmem_cache_init
> > - sti
> > When I write these functions work, I mean they return and I can continue.
> > This one then get's me the previously mentioned error message :
> > - calibrate_delay
> >
> > TIA
> >
> > Werner K�hnert
> >
> > Kasper Dupont wrote:
> >
> > > Dipl. Ing. Werner K�hnert wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Can anyone give me a hint what the reason for the following message can
> > > > be :
> > > >
> > > > Kernel panic: Attempted to kill the idle task!
> > > > In Swapper task - not syncing.
> > > >
> > > > TIA
> > > >
> > > >  Regards
> > > >   Werner Kuehnert
> > > >
> > > > --
> > > > Werner Kuehnert Siemens AG Oesterreich PSE ECT IPN 1
> > > > E-Mail: [EMAIL PROTECTED]
> > >
> > > We need some more context to explain what happens,
> > > most likely you must have got some other error
> > > messages before the two mentioned.
> > >
> > > What did you do to provoke this message?
> > >
> > > --
> > > Kasper Dupont
> >
> > --
> >  Regards
> >   Werner Kuehnert
> >
> > Werner Kuehnert Siemens AG Oesterreich PSE EZE PN PS
> > E-Mail: [EMAIL PROTECTED]
>
> --
> Kasper Dupont

Regards
    Werner

--

Werner Kuehnert Siemens AG Oesterreich PSE ECT IPN 1
E-Mail: [EMAIL PROTECTED]



------------------------------

From: Rolf Magnus <[EMAIL PROTECTED]>
Subject: Re: bootsector
Date: Fri, 23 Mar 2001 12:36:31 +0100

Moe wrote:

> I tried putting image into bootsector by:
> $dd if=boot1 of=/dev/fd0 bs=1k conv=sync
> It didn't work and distroyed fs. I formated floppy then same story.
> Am I something wrong, or I got two bad floppies?
> (Boot1 is less than 8k)

A sector is normally 512 bytes long, not 8k

------------------------------

From: [EMAIL PROTECTED] (jurriaan kalkman)
Subject: Re: Bypassing login prompt?
Date: 23 Mar 2001 12:05:58 GMT
Reply-To: [EMAIL PROTECTED]

On Fri, 23 Mar 2001 08:28:25 +0100, Rolf Magnus
<[EMAIL PROTECTED]> wrote:
> Norm Dresner wrote:
> 
>> An autologin facility exists in SGI's IRIX.  Does something like that
>> exist in Linux?
> 

# /etc/inittab
#
3:2345:respawn:/sbin/mingetty --noclear --user jurriaan tty3

check out mingetty.

Good luck,
Jurriaan

-- 
When you stick your fingers in the mains, its not the imaginary component
which you will feel.
>From an EIST lecturer
GNU/Linux 2.4.2-ac22 SMP/ReiserFS 2x1743 bogomips load av: 0.06 0.08 0.06

------------------------------

From: "jbhuang" <[EMAIL PROTECTED]>
Subject: Re: problem  debugging
Date: Fri, 23 Mar 2001 20:37:24 +0800

Is it helpful if I am trying to port linux under "non-x86" platform?


"Josef Moellers" <[EMAIL PROTECTED]> ?????
news:[EMAIL PROTECTED]...
Rud wrote:
>
> I all what is  the proper way for debugging  a running process  and
> find out where  the application
> have stop  and to correct the problem

Have you looked at gdb?

Also, strace is sometimes helpfull to check roughly what an application
is doing.

--
Josef M�llers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T.  Pratchett



------------------------------

From: "Moe" <[EMAIL PROTECTED]>
Subject: Re: bootsector
Date: Fri, 23 Mar 2001 14:13:55 GMT

Sector is 512 bytes. Bootsector has 1st 8K reserved for itself.
It helps to know what you are talking about.

"Rolf Magnus" <[EMAIL PROTECTED]> wrote in message
news:99fcai$odu$[EMAIL PROTECTED]...
> Moe wrote:
>
> > I tried putting image into bootsector by:
> > $dd if=boot1 of=/dev/fd0 bs=1k conv=sync
> > It didn't work and distroyed fs. I formated floppy then same story.
> > Am I something wrong, or I got two bad floppies?
> > (Boot1 is less than 8k)
>
> A sector is normally 512 bytes long, not 8k
>



------------------------------

From: "Moe" <[EMAIL PROTECTED]>
Subject: Re: bootsector
Date: Fri, 23 Mar 2001 14:15:12 GMT

> >I tried putting image into bootsector by:
> >$dd if=boot1 of=/dev/fd0 bs=1k conv=sync
> >It didn't work and distroyed fs. I formated floppy then same story.
> >Am I something wrong, or I got two bad floppies?
> >(Boot1 is less than 8k)
>
> Yes, doen't do that.  The boot sector of a floppy disk is an integral
> part of the file system, and of you overwrite this the floppy file
> system is destroyed.  An exception is if you have ext2fs file system
> on your floppy.  Another exception is if you intend to copy the entire
> kernel file to the floppy, in which case the floppy no longer has a file
> system, just a bootable kernel.

Do you know where does FS begin?
If not why say all this?




------------------------------

From: Kasper Dupont <[EMAIL PROTECTED]>
Subject: Re: How to Kill "unkillable" process
Date: Fri, 23 Mar 2001 14:38:07 +0000

[EMAIL PROTECTED] wrote:
> 
> Martin Collins <[EMAIL PROTECTED]> wrote:
> : just a comment. This behaviour seems a little strange. Issuing a kill -9
> : shouldn't result
> : in the process being signaled. It should just be wiped by the kernel. After
> : all SIGKILL is
> : not supposed to be "catchable" at all.
> 
> What about a zombie process?  IIRC, wouldn't it show up in the ps list,
> but not really be kill-able?
> 
> --
>     Jeff Gentry  [EMAIL PROTECTED]  [EMAIL PROTECTED]
>            SEX           DRUGS           UNIX

A process is a zombie from the time it dies until
the parent discovers that the process is dead. If
a process stays in the zombie state for a long
time, it is a problem with the parent. If you
want to get rid of a zombie process kill the
parent.

-- 
Kasper Dupont

------------------------------

From: Kasper Dupont <[EMAIL PROTECTED]>
Subject: Re: Wannabe -- Wrote LAN driver now want to install
Date: Fri, 23 Mar 2001 14:57:17 +0000

QuasiCodo wrote:
> 
> Hi.  I'm new at Linux LAN driver writing and I have found the ne2k-pci.c
> module, modified it for my hw and now I just need to know if there is a
> standard way to install it.  I've written so much Windows junk, I'm thinking
> in INF file terms.  Is there something like that with Linux, or do I just
> tar it up and distribute it?  Forgive me for my ignorance, but I only have
> half a brain, and it almost full of Windows crap.
> 
> ((&-<

RedHat and some other distributions use RPM packages.
There are also distributions using other package
systems.

I don't know enough about the different package
systems to say which one you should prefer.

You can also choose just to distribute a tar
archive, for sure that is the easiest way to
distribute your driver.

(I have tried making an RPM file, the process of
extracting a .srpm file making a few modifications
and then create a new .srpm and .rpm file is quite
involving.)

-- 
Kasper Dupont

------------------------------

From: [EMAIL PROTECTED]
Subject: Re: How to Kill "unkillable" process
Date: Fri, 23 Mar 2001 15:00:20 -0000

On Thu, 22 Mar 2001 14:25:11 +0000 Kasper Dupont <[EMAIL PROTECTED]> wrote:

| A process catches all signals in kernel mode, the kernel
| code then chooses between a number of actions. It can
| choose to ignore the signal, deliver the signal, freeze
| the process, dump core or just quit by calling do_exit().
| With signal 9 the choice is always to call do_exit().

Then do_exit() fails to complete because the process is blocked or
otherwise cannot complete certain things like closing files or
devices.

Linux, like every other Unix system I know of, has process and
device layers too tightly bound, and too weak of an abstraction
between them, to make them fully autonomous.  Given Linux's
strong monolithic design, this isn't likely to be changed.
Perhaps some microkernel design would be able to pull thus off.

Still, there are some things the Linux design could do to help
the situation.

One of the typical problems caused by a hung process is that while
it may be holding an open file descriptor to the hung device, it
also holds open descriptors and current directory to other things
as well, such as file systems.  If the uncatchable kill signal
were to cause the process to forcibly close every open file that
can be closed, the impact of the hung process can be lessened.

Revoking a file decriptor on a killed process shouldn't result in
that process trying to do I/O on a non-open fd, since the process
is killed and cannot run.

Suppose a process is hung writing a SCSI tape drive because the
cable was loose.  That process has a current directory, and an
open file, in a large filesystem.  What kill should try to do is
close everything.  It would probably fail on the tape drive, but
it should succeed on the open file.  It should also revoke the
current directory of the process.  I know this can be done because
I have deleted directories underneath processes and found them in
limbo as a result.  One way to accomplish this is to change the
current directory of the process to a dummy virtual inode just for
this purpose.

The process may not totally go away because the driver cannot
complete the close, but it will no longer prevent the graceful
unmounting of the filesystem it was backing up.  I may still have
to reboot, but perhaps now I can schedule the reboot at a time
when it is more convenient, and get on with other work in the
mean time.

Next, device drivers could do a better job.  They should be made
to better handle closing of devices even if the device itself does
appear to be hung.  If the timeout has expired for the last operation
then the device should be considered to be in an error condition.
A request to close should be honored with any appropriate status
showing that the device did not complete previous operations.  For
a process doing non-blocking I/O, it should be able to even get this
done via the close() syscall, with no more a delay than to determine
that the device is indeed hung (wait the timeout then finish close).

-- 
=================================================================
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/     |
=================================================================

------------------------------

From: Kasper Dupont <[EMAIL PROTECTED]>
Crossposted-To: comp.os.linux,comp.os.linux.development.apps,comp.os.linux.networking
Subject: Re: Too many open files error
Date: Fri, 23 Mar 2001 15:04:27 +0000

Victor wrote:
> 
> Hi, all
> 
> we encountered a problem that the system always report :
> 
> "error in loading shared libraries: libxxxx.so.2: cannot open shared object
> file: Error 23"
>  or
> "socket: Too many open files in system"
> 
> whatever I want to do. I heard this is caused by the exhaustion of the
> system file descriptor.
> But how can I fixed this problem?  The key point is that I can't do anything
> on that machince
> now!
> 
> E-mail: [EMAIL PROTECTED]

Look on the two files: /proc/sys/fs/file-max and
/proc/sys/fs/file-nr

The first file contains the maximum number of files
allowed to be opened on the system. This number can
be changed by root.

The second file contains three numbers. First the
maximum number of file descriptors that have been
in use at one time since the system was booted, the
second is the number in use now, and the last is
the maximum allowed.

Increasing the maximum will help if you just have
needs for lots of file descriptors, but if some
process is eating up file descriptors the problem
will show up again soon.

Also try looking in the directories /proc/<pid>/fd
to see how many file descriptors different processes
use.

-- 
Kasper Dupont

------------------------------

From: [EMAIL PROTECTED]
Subject: Re: How to Kill "unkillable" process
Date: Fri, 23 Mar 2001 15:04:37 -0000

On 23 Mar 2001 13:58:13 +1100 Nick Andrew <[EMAIL PROTECTED]> wrote:
| [EMAIL PROTECTED] writes:
|
|>What about a zombie process?  IIRC, wouldn't it show up in the ps list,
|>but not really be kill-able?
|
| A zombie is already dead, so it can't be killed. A zombie does show up
| in the process list because it occupies a process table slot but no
| other resources (no memory, no open files, etc...)

Maybe we need yet another kill signal (zkill) which further changes the
process's parent PID to init (much like the parent being killed, but not
actually killing the parent), and running through the kill again, maybe
even briefly waking init so it calls wait again to finish the status.

-- 
=================================================================
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| [EMAIL PROTECTED] | Texas, USA | http://phil.ipal.org/     |
=================================================================

------------------------------

From: Don Carroll <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: kernel panic , can someone help
Date: Fri, 23 Mar 2001 14:51:16 GMT

here is the output from ksymoops , anyone got any ideas what is going on



>>EIP; c01dd88f <__scm_destroy+7/38>   <=====
Trace; c01d9ec2 <sock_recvmsg+6e/ac>
Trace; c01329b7 <sys_read+8f/c4>
Trace; c0108f57 <system_call+33/38>
Code;  c01dd88f <__scm_destroy+7/38>
00000000 <_EIP>:
Code;  c01dd88f <__scm_destroy+7/38>   <=====
   0:   8b 70 0c                  mov    0xc(%eax),%esi   <=====
Code;  c01dd892 <__scm_destroy+a/38>
   3:   85 f6                     test   %esi,%esi
Code;  c01dd894 <__scm_destroy+c/38>
   5:   74 23                     je     2a <_EIP+0x2a> c01dd8b9
<__scm_destroy+31/38>
Code;  c01dd896 <__scm_destroy+e/38>
   7:   c7 40 0c 00 00 00 00      movl   $0x0,0xc(%eax)
Code;  c01dd89d <__scm_destroy+15/38>
   e:   8b 1e                     mov    (%esi),%ebx
Code;  c01dd89f <__scm_destroy+17/38>
  10:   4b                        dec    %ebx
Code;  c01dd8a0 <__scm_destroy+18/38>
  11:   78 0e                     js     21 <_EIP+0x21> c01dd8b0
<__scm_destroy+28/38>
Code;  c01dd8a2 <__scm_destroy+1a/38>
  13:   8d 00                     lea    (%eax),%eax



tlin wrote:

> You should post what you did with the kernel.
>
> Don Carroll <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]...
> > stock 2.4.2 kernel
> >
> > sendmail 8.12beta , did this also with 8.11
> >
> > tyan motherbd , symbios scsi 160mb
> >
> > anyone know how to read this
> >
> > also did it with 2.2.17 mandrake kernel
> >
> >
> > Mar 22 05:34:09 sm3 kernel: Unable to handle kernel NULL pointer
> > dereference at virtual address 0000021a
> > Mar 22 05:34:09 sm3 kernel:  printing eip:
> > Mar 22 05:34:09 sm3 kernel: c01dd88f
> > Mar 22 05:34:09 sm3 kernel: *pde = 00000000
> > Mar 22 05:34:09 sm3 kernel: Oops: 0000
> > Mar 22 05:34:09 sm3 kernel: CPU:    1
> > Mar 22 05:34:09 sm3 kernel: EIP:    0010:[<c01dd88f>]
> > Mar 22 05:34:09 sm3 kernel: EFLAGS: 00010286
> > Mar 22 05:34:09 sm3 kernel: eax: 0000020e   ebx: e618aac4   ecx:
> > 00000000   edx: 000005b4
> > Mar 22 05:34:09 sm3 kernel: esi: 000005b4   edi: 0000020e   ebp:
> > ccbfbf80   esp: ccbfbf20
> > Mar 22 05:34:09 sm3 kernel: ds: 0018   es: 0018   ss: 0018
> > Mar 22 05:34:09 sm3 kernel: Process sendmail (pid: 12764,
> > stackpage=ccbfb000)
> > Mar 22 05:34:09 sm3 kernel: Stack: e618aac4 000005b4 0000020e c01d9ec2
> > 0000020e e618aac4 00001000 00000000
> > Mar 22 05:34:09 sm3 kernel:        00000001 0000020e 000010a0 00001000
> > 00000000 880c8400 00000080 0000021b
> > Mar 22 05:34:09 sm3 kernel:        e618aac4 ccbfbf80 00001000 00000000
> > e67e2260 ffffffea 08131d94 00000a4c
> > Mar 22 05:34:09 sm3 kernel: Call Trace: [<c01d9ec2>] [<c01329b7>]
> > [<c0108f57>]
> > Mar 22 05:34:09 sm3 kernel:
> > Mar 22 05:34:09 sm3 kernel: Code: 8b 70 0c 85 f6 74 23 c7 40 0c 00 00 00
> > 00 8b 1e 4b 78 0e 8d
> >


------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list by posting to the
comp.os.linux.development.system newsgroup.

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi                                pub/Linux
    tsx-11.mit.edu                              pub/linux
    sunsite.unc.edu                             pub/Linux

End of Linux-Development-System Digest
******************************

Reply via email to