Re: Fixing gave up waiting for init of module libcrc32c.

2010-03-30 Thread Kay Sievers
On Tue, Mar 30, 2010 at 18:50, Brandon Philips bran...@ifup.org wrote:
 Side note: Intially I thought that modprobe was forking for each
 module dependency but that was incorrect after looking at the
 code. udev is forking off two modprobes almost at the same time for
 each bnx2x device alias when udevadm trigger is called during the boot
 sequence.

Right, that's the standard behavior, and applies to all usual systems.

 FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
 patched.

Yes, there is no SUSE patch at all. All usual systems ship the same stuff here.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-08 Thread Kay Sievers
On Tue, Dec 9, 2008 at 02:09, Theodore Tso [EMAIL PROTECTED] wrote:
 On Mon, Dec 08, 2008 at 04:35:20AM +0100, Kay Sievers wrote:
 Sure, if we can make userspace behave nicely, I'm all for doing it. On
 the other hand, I think it's a good thing to provide a sane
 environment by the kernel for any non-initramfs-optimized helper.
 Writing to /dev/console is a usual behavior for tools used in
 initramfs, to be able to debug bootup problems. In many cases it is
 just glibc's fallback LOG_CONS, if syslog is not available.

 Well, can we agree that (a) there is a generalized modprobe recursion
 detector already in the kernel, and (b) for some reason, Debian isn't
 triggering it?  That seems like a bug, and while it may not be 100%
 clear whether the bug is on the userspace side or the kernel side, it
 would seem that finding and fixing *that* would be a Good Thing, and
 it could be argued that a specific don't request char-major-5-1 would
 be papering over this bug (whether it is in Debian's initrd or in the
 kernel side code).  It would be good to make sure we understand what
 the root causes for while the modprobe recursion detector is
 apparently not triggering, since it could be that Debian's initrd
 might cause some other uncaught recursion loop if we don't drive this
 problem determination to root cause.

Right, it would be good to know for sure what's going on.

In my test script it was just a simple access to /dev/console, so it
might just be that modprobe wants to log something on Debian. At least
that would match the behavior I've seen here.

And we really rely on these tools to be able to print debug messages
sometimes to find other bugs (at least at the time it's theoretically
possible to log). And I don't see how modprobe could find out, that it
is not allowed to access /dev/console.

As an ideal solution, I would like to have /dev/console working before
_any_ userspace is called, and it would put all the stuff that is
written to it the kernel dmesg buffer as long as the console isn't
working, just to be able to see what's going on, if needed. As the
second choice, it would return -ENODEV. The current looping is
definitely not what we want. :)

 That's why I still think it's a good thing, to connect the core tty
 devices to their dev_t handler internally, before we init all the
 other drivers and run userspace.

 Um, how early?  (/me searches lkml.org for the original patch).  Ah,
 OK, you want to do it postcore  There may actually be a problem
 with that, because it looks like vtconsole_class_init (which is
 currently run as a postcore initcall) really wants to happen before
 the tty layer initializes itself.  So moving tty into postcore could
 potentially run into problems, depending on whether vt.c's or
 tty_io.c's initcalls are run first.

Sure, we might need to change that, if it is a problem. It was mainly
to find out what was going on, because people thought that some serial
driver is involved in the loop, which I didn't believe.

To be safer, maybe we just want to move the cdev_add(),
register_chrdev_region() call, which should prevent the userspace
driver search.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 16:55, Herbert Xu [EMAIL PROTECTED] wrote:
 On Sun, Dec 07, 2008 at 10:49:27PM +0800, Herbert Xu wrote:

 In any case the loop itself does not involve any crypto components
 so I don't think making changes in the crypto layer is going to
 make this go away forever as anyone calling request_module early
 enough will get into this loop.

 Having said that I think this patch should remove this particular
 trigger.  Unfortunately I couldn't find a more succinct way of
 expressing this relationship: if ALGAPI is built-in because it's
 selected by an algorithm, and CRYPTO_MANAGER is enabled then we
 require CRYPTO_MANAGER to be built-in as well.

 Evgeniy, please let me know whether this fixes the problem.

Even when this works around it, the problem that the kernel triggers
module loading on /dev/console access stays and needs to be fixed
instead.

It's a pure kernel bug, regardless of how many times Alan likes to
establish non-sensical hotplug rules to prevent this. The next
innocent driver change, like yours is, will run into the same problem.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 18:01, Alan Cox [EMAIL PROTECTED] wrote:
 Wrong. First at least because tty/console stuff is built in.

 Are you two people or just two names and email addresses for
 the same ?

Damn, now you found that out.

I suggest you get a few more mail addresses too, to back your
non-sensical claims, and try to establish silly userspace rules to
work around an obvious kernel bug. :)

Good luck, looking forward to your creativity,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 19:03, Alan Cox [EMAIL PROTECTED] wrote:
 On Sun, 7 Dec 2008 20:54:38 +0300
 Evgeniy Polyakov [EMAIL PROTECTED] wrote:

 On Sun, Dec 07, 2008 at 05:52:45PM +, Alan Cox ([EMAIL PROTECTED]) wrote:
   Alan, let's make some progress on this fingerpointing. If Herbert's
   patch fixes the crypto loading problem, it will find its way upstream
   for the current tree, and in the merge window Kay's patch may be applied
   and widely tested. Thoughts?
 
  I have no intention of applying Kay's patch because it is wrong and it
  will only break things not fix them.

 And you are sure it breaks something based on what?


 Firstly:

 You propose to implement

modprobe fails (due to crypto requirements)
open /dev/console
-ENODEV
log error to nowhere

Yes, log nowhere instead of running in a loop would be much better
than loading a 5:1 driver which will never exist as a module.

 Why is this useful - you now get failing module loads producing no
 diagnostics and in many case the setup just dying silently.

It's obviously more useful than not to boot up.

 Previously
 you got an attempt to recover and diagnostics which allowed the problem
 to be found (as Herbert did)

 Secondly:

 If I have a /dev/console on a PCI device and I have modprobe set to load
 8250_pci or a framebuffer driver when the console is opened you will
 break the current working behaviour.

No, the pci driver will never get loaded by modprobe 5:1, that's all
what this bug is about. It connects to the existing console device
when the driver is loaded, and at that moment /dev/console gets
working.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 19:15, Alan Cox [EMAIL PROTECTED] wrote:
 Yes, log nowhere instead of running in a loop would be much better
 than loading a 5:1 driver which will never exist as a module.

 The loop is detected and terminated.

No. Please back up what you are trying to talk about.

  Why is this useful - you now get failing module loads producing no
  diagnostics and in many case the setup just dying silently.

 It's obviously more useful than not to boot up.

 What makes you think it will now boot up. The loop is already detected
 and terminated. What will you do if it doesn't and you get no
 diagnostics. How will distributions debug those reports in bugzilla.

The boxes of the reporters hang! Read the bug! Please!

 No, the pci driver will never get loaded by modprobe 5:1,

 Why not ? You have no idea how the other millions of Linux users have
 their module loading rules configured. A change which breaks this
 behaviour is a regression.

There is no cheap way out of the problem, it's a kernel bug, and we
will fix it - you may just delay it with your zero arguments.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 18:51, Alan Cox [EMAIL PROTECTED] wrote:
 On Sun, 7 Dec 2008 18:39:48 +0100
 Kay Sievers [EMAIL PROTECTED] wrote:

 On Sun, Dec 7, 2008 at 18:28, Alan Cox [EMAIL PROTECTED] wrote:
   /dev/console is a logical mapping to a device which may well be
   different, loaded after PCI is initialised and dependant on PCI.
 
  So wrong. If no driver is associated, like early, in that case, we
  must return -ENODEV, instead of calling modprobe in a loop. It's a
  built-in device, and it's easy to fix.
 
  You've clearly no idea how initrd even works have you ?

 Not sure, if you understand the real problem. A kernel forked binary
 is allowed to access /dev/console, but it triggers a kernel bug.

 If there is a hotplug load for the console device then it cannot. Just as
 a request for the driver for /dev/hda cannot open /dev/hda* again.

So wrong. Modprobe 5:1 will never load anything, but kill the box in
such case at that time of bootup.

 Nonsense. The kernel calls /sbin/modprobe directly, no hotplug involved.

 Its up to you if the kernel calls modprobe and what your modprobe is

What are you saying? Your picture of hotplug is just wrong, regardless
of it's up to you. I talk about what people use, and what is
obviously broken currently.

 The kernel calls modprobe for something, modprobe tries to log an
 error, and the kernel calls modprobe again. Bug! No hotplug involved.

 A modprobe to load the console device shouln't open /dev/console. Very
 simple and always been true.

We are _not_ loading a console driver that way, we try to load the
console device itself, not the driver. There is no driver for 5:1 in
any module.

 Kernel issues hotplug message
 .
 Kernel detects this is stuck
 Kernel replies with -ENODEV/-ENXIO to try
  and rescue itself from buggy initrd scripts

 Totally wrong, It never was that way

 Funny but its been that way for many many years. Just as it cannot try and
 syslog an error when loading the AF_UNIX socket family.

And? Keep on the subject please. It's still a bug to run in a loop
trying something that will _never_ exist.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 19:31, Alan Cox [EMAIL PROTECTED] wrote:
  The loop is detected and terminated.

 No. Please back up what you are trying to talk about.

 Let me introduce you to.. drum roll.. the source code. Its a useful
 resource, why don't you use it for once.


max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
atomic_inc(kmod_concurrent);
if (atomic_read(kmod_concurrent)  max_modprobes) {
/* We may be blaming an innocent here, but unlikely */
if (kmod_loop_msg++  5)
printk(KERN_ERR
   request_module: runaway loop modprobe
 %s\n, module_name);
atomic_dec(kmod_concurrent);
return -ENOMEM;
}

 Happy now. Print it out, share it with friends, find someone who can read
 C if you are stuck.

It does not work, that's all. Reproduce the bug and look at it for yourself.

It's still a bug, regardless of all the childish stuff you wrap around it.

 The boxes of the reporters hang! Read the bug! Please!

 They would still hang. As I repeatedly said for the benefit of two people
 who don't seem to be able to read source code, the loop is detected and
 terminated. So it already fails the open when it sees it has gotten five
 layers deep.

 There is no cheap way out of the problem, it's a kernel bug, and we
 will fix it - you may just delay it with your zero arguments.

 Oh I see. Allow me to explain your position in the words of some small
 children I know

ME!! ME!! ME!! ME!! ME!!

 I don't care about your obscure corner-case non bug that in fact was a
 crypto bug combined with a modprobe bug and where the crypto bug is
 now fixed. I do care about not breaking existing users systems. The fact
 we do this is why Linux doesn't suck.

It's not obscure, it's obviously broken. And it currently sucks.

I have no idea why are you repeatedly, with completely wrong arguments
trying, to explain hotplug stuff to me. I maintain it by the way,
and I didn't see you involved here in the last years, so I guess you
miss the background to understand the problem.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Sun, Dec 7, 2008 at 21:00, Alan Cox [EMAIL PROTECTED] wrote:
 max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
 atomic_inc(kmod_concurrent);
 if (atomic_read(kmod_concurrent)  max_modprobes) {
 /* We may be blaming an innocent here, but unlikely */
 if (kmod_loop_msg++  5)
 printk(KERN_ERR
request_module: runaway loop modprobe
  %s\n, module_name);
 atomic_dec(kmod_concurrent);
 return -ENOMEM;
 }
 
  Happy now. Print it out, share it with friends, find someone who can read
  C if you are stuck.

 It does not work, that's all.

 It works for me.

It runs in a loop here, just like the $subject says, and bko#12153 says.

 Reproduce the bug and look at it for yourself.

 Well since you've got a reproducer and this code works for me (I've tested
 it just fine), why don't you go and reproduce the problem then post a fix
 to that code I quoted instead of all this reordering rubbish. If you fix
 this code not only won't you risk all the mess from re-ordering
 initialisations around the kernel but you'll fix non console related
 looping which you imply is also broken as you claim that code doesn't
 work for you.

 If I deliberately break my module utils I see a sequence of modprobes
 which then hits kmod_concurrent limit then causes a -ENOMEM back to
 userspace which then fails the file open. The bug report also shows the
 printk is displayed so the runaway loop *was* detected and the code paths
 taken which stopped the loop.

Sure, I can try to find out why the limiter does does not work here.

 I get open - modprobe - open - modprobe - open - modprobe ... -
 open fail, then open fail, open fail, open fail, open fail back to the
 first modprobe exiting.

 Your proposal to keep the current recent modprobe parameter strings would
 shorten the amount of recursion but it wouldn't change the result that I
 can see. If I open /dev/console early and wrongly from a modprobe then I
 ultimately get a failing open just as I should do.

No, my proposal would prevent the kernel to call any modprobe for the
special built-in device 5:1, it shortens the recursion to zero, which
sounds like the right fix. It's really weird to fix the symptoms,
instead of the real problem.

The kernel forks a binary with a broken environment. Even the
in-kernel-tree cpio generator creates /dev/console, and accessing it
from a kernel forked binary makes it crash. The kernel must provide a
sane environment for stuff that it calls, I think, that is pretty
obvious.

Device 5:1 is a core device, which never makes sense to call modprobe
for it. No other later driver will ever register that dev_t, so we
should do it before calling out to other random userspace stuff which
triggers the kernel to go crazy with its own devices.

My proposal was to connect 5:1 to the kobject map at the same time as
we register the whole tty class. We allocate the class, create sysfs
entries, run /sbin/modprobe at that time, why shouldn't we just
register the dev_t that time too?

It properly prevents the needless userspace driver searching for a
well known, already created, but not properly registered core device.

It makes the process environment sane, and prevents the root cause of
the bug, and does not just limit the damage.

Thanks,
Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Runaway loop with the current git.

2008-12-07 Thread Kay Sievers
On Mon, Dec 8, 2008 at 04:23,  [EMAIL PROTECTED] wrote:
 On Sun, 07 Dec 2008 19:22:41 +0100, Kay Sievers said:

 We are _not_ loading a console driver that way, we try to load the
 console device itself, not the driver. There is no driver for 5:1 in
 any module.

[childish blurb removed]

 It doesn't really matter if it's a console driver or the console device
 itself.  If you're in modprobe loading *any* piece of all the stuff needed
 to make open(/dev/console) work, the last thing you want to be doing
 is opening /dev/console to complain about something not working.

Nothing in initramfs or userspace tries to load all the stuff needed
to make open(/dev/console) work. It's the kernel itself, that tries
to resolve its own requirements.

The kernel forked binary _writes_ to /dev/console, if we like it or
not, it seems to do that, and it's arguable why it should not, or how
it should detect that it is not allowed to do that. You may just
depend on the logging of binaries to find other bugs.

The helper was called in the first place for something else, in this
case the cryptomgr. The loop is caused entirely by the kernel
itself, if /dev/console is just accessed. There is no intentional
loading of any console driver from userspace happening here.

Kay
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html