Re: Fixing gave up waiting for init of module libcrc32c.
On Tue, Mar 30, 2010 at 18:50, Brandon Philips bran...@ifup.org wrote: Side note: Intially I thought that modprobe was forking for each module dependency but that was incorrect after looking at the code. udev is forking off two modprobes almost at the same time for each bnx2x device alias when udevadm trigger is called during the boot sequence. Right, that's the standard behavior, and applies to all usual systems. FWIW, I don't think this is a SUSE thing as our udev is pretty lightly patched. Yes, there is no SUSE patch at all. All usual systems ship the same stuff here. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Tue, Dec 9, 2008 at 02:09, Theodore Tso [EMAIL PROTECTED] wrote: On Mon, Dec 08, 2008 at 04:35:20AM +0100, Kay Sievers wrote: Sure, if we can make userspace behave nicely, I'm all for doing it. On the other hand, I think it's a good thing to provide a sane environment by the kernel for any non-initramfs-optimized helper. Writing to /dev/console is a usual behavior for tools used in initramfs, to be able to debug bootup problems. In many cases it is just glibc's fallback LOG_CONS, if syslog is not available. Well, can we agree that (a) there is a generalized modprobe recursion detector already in the kernel, and (b) for some reason, Debian isn't triggering it? That seems like a bug, and while it may not be 100% clear whether the bug is on the userspace side or the kernel side, it would seem that finding and fixing *that* would be a Good Thing, and it could be argued that a specific don't request char-major-5-1 would be papering over this bug (whether it is in Debian's initrd or in the kernel side code). It would be good to make sure we understand what the root causes for while the modprobe recursion detector is apparently not triggering, since it could be that Debian's initrd might cause some other uncaught recursion loop if we don't drive this problem determination to root cause. Right, it would be good to know for sure what's going on. In my test script it was just a simple access to /dev/console, so it might just be that modprobe wants to log something on Debian. At least that would match the behavior I've seen here. And we really rely on these tools to be able to print debug messages sometimes to find other bugs (at least at the time it's theoretically possible to log). And I don't see how modprobe could find out, that it is not allowed to access /dev/console. As an ideal solution, I would like to have /dev/console working before _any_ userspace is called, and it would put all the stuff that is written to it the kernel dmesg buffer as long as the console isn't working, just to be able to see what's going on, if needed. As the second choice, it would return -ENODEV. The current looping is definitely not what we want. :) That's why I still think it's a good thing, to connect the core tty devices to their dev_t handler internally, before we init all the other drivers and run userspace. Um, how early? (/me searches lkml.org for the original patch). Ah, OK, you want to do it postcore There may actually be a problem with that, because it looks like vtconsole_class_init (which is currently run as a postcore initcall) really wants to happen before the tty layer initializes itself. So moving tty into postcore could potentially run into problems, depending on whether vt.c's or tty_io.c's initcalls are run first. Sure, we might need to change that, if it is a problem. It was mainly to find out what was going on, because people thought that some serial driver is involved in the loop, which I didn't believe. To be safer, maybe we just want to move the cdev_add(), register_chrdev_region() call, which should prevent the userspace driver search. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 16:55, Herbert Xu [EMAIL PROTECTED] wrote: On Sun, Dec 07, 2008 at 10:49:27PM +0800, Herbert Xu wrote: In any case the loop itself does not involve any crypto components so I don't think making changes in the crypto layer is going to make this go away forever as anyone calling request_module early enough will get into this loop. Having said that I think this patch should remove this particular trigger. Unfortunately I couldn't find a more succinct way of expressing this relationship: if ALGAPI is built-in because it's selected by an algorithm, and CRYPTO_MANAGER is enabled then we require CRYPTO_MANAGER to be built-in as well. Evgeniy, please let me know whether this fixes the problem. Even when this works around it, the problem that the kernel triggers module loading on /dev/console access stays and needs to be fixed instead. It's a pure kernel bug, regardless of how many times Alan likes to establish non-sensical hotplug rules to prevent this. The next innocent driver change, like yours is, will run into the same problem. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 18:01, Alan Cox [EMAIL PROTECTED] wrote: Wrong. First at least because tty/console stuff is built in. Are you two people or just two names and email addresses for the same ? Damn, now you found that out. I suggest you get a few more mail addresses too, to back your non-sensical claims, and try to establish silly userspace rules to work around an obvious kernel bug. :) Good luck, looking forward to your creativity, Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 19:03, Alan Cox [EMAIL PROTECTED] wrote: On Sun, 7 Dec 2008 20:54:38 +0300 Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Sun, Dec 07, 2008 at 05:52:45PM +, Alan Cox ([EMAIL PROTECTED]) wrote: Alan, let's make some progress on this fingerpointing. If Herbert's patch fixes the crypto loading problem, it will find its way upstream for the current tree, and in the merge window Kay's patch may be applied and widely tested. Thoughts? I have no intention of applying Kay's patch because it is wrong and it will only break things not fix them. And you are sure it breaks something based on what? Firstly: You propose to implement modprobe fails (due to crypto requirements) open /dev/console -ENODEV log error to nowhere Yes, log nowhere instead of running in a loop would be much better than loading a 5:1 driver which will never exist as a module. Why is this useful - you now get failing module loads producing no diagnostics and in many case the setup just dying silently. It's obviously more useful than not to boot up. Previously you got an attempt to recover and diagnostics which allowed the problem to be found (as Herbert did) Secondly: If I have a /dev/console on a PCI device and I have modprobe set to load 8250_pci or a framebuffer driver when the console is opened you will break the current working behaviour. No, the pci driver will never get loaded by modprobe 5:1, that's all what this bug is about. It connects to the existing console device when the driver is loaded, and at that moment /dev/console gets working. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 19:15, Alan Cox [EMAIL PROTECTED] wrote: Yes, log nowhere instead of running in a loop would be much better than loading a 5:1 driver which will never exist as a module. The loop is detected and terminated. No. Please back up what you are trying to talk about. Why is this useful - you now get failing module loads producing no diagnostics and in many case the setup just dying silently. It's obviously more useful than not to boot up. What makes you think it will now boot up. The loop is already detected and terminated. What will you do if it doesn't and you get no diagnostics. How will distributions debug those reports in bugzilla. The boxes of the reporters hang! Read the bug! Please! No, the pci driver will never get loaded by modprobe 5:1, Why not ? You have no idea how the other millions of Linux users have their module loading rules configured. A change which breaks this behaviour is a regression. There is no cheap way out of the problem, it's a kernel bug, and we will fix it - you may just delay it with your zero arguments. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 18:51, Alan Cox [EMAIL PROTECTED] wrote: On Sun, 7 Dec 2008 18:39:48 +0100 Kay Sievers [EMAIL PROTECTED] wrote: On Sun, Dec 7, 2008 at 18:28, Alan Cox [EMAIL PROTECTED] wrote: /dev/console is a logical mapping to a device which may well be different, loaded after PCI is initialised and dependant on PCI. So wrong. If no driver is associated, like early, in that case, we must return -ENODEV, instead of calling modprobe in a loop. It's a built-in device, and it's easy to fix. You've clearly no idea how initrd even works have you ? Not sure, if you understand the real problem. A kernel forked binary is allowed to access /dev/console, but it triggers a kernel bug. If there is a hotplug load for the console device then it cannot. Just as a request for the driver for /dev/hda cannot open /dev/hda* again. So wrong. Modprobe 5:1 will never load anything, but kill the box in such case at that time of bootup. Nonsense. The kernel calls /sbin/modprobe directly, no hotplug involved. Its up to you if the kernel calls modprobe and what your modprobe is What are you saying? Your picture of hotplug is just wrong, regardless of it's up to you. I talk about what people use, and what is obviously broken currently. The kernel calls modprobe for something, modprobe tries to log an error, and the kernel calls modprobe again. Bug! No hotplug involved. A modprobe to load the console device shouln't open /dev/console. Very simple and always been true. We are _not_ loading a console driver that way, we try to load the console device itself, not the driver. There is no driver for 5:1 in any module. Kernel issues hotplug message . Kernel detects this is stuck Kernel replies with -ENODEV/-ENXIO to try and rescue itself from buggy initrd scripts Totally wrong, It never was that way Funny but its been that way for many many years. Just as it cannot try and syslog an error when loading the AF_UNIX socket family. And? Keep on the subject please. It's still a bug to run in a loop trying something that will _never_ exist. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 19:31, Alan Cox [EMAIL PROTECTED] wrote: The loop is detected and terminated. No. Please back up what you are trying to talk about. Let me introduce you to.. drum roll.. the source code. Its a useful resource, why don't you use it for once. max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT); atomic_inc(kmod_concurrent); if (atomic_read(kmod_concurrent) max_modprobes) { /* We may be blaming an innocent here, but unlikely */ if (kmod_loop_msg++ 5) printk(KERN_ERR request_module: runaway loop modprobe %s\n, module_name); atomic_dec(kmod_concurrent); return -ENOMEM; } Happy now. Print it out, share it with friends, find someone who can read C if you are stuck. It does not work, that's all. Reproduce the bug and look at it for yourself. It's still a bug, regardless of all the childish stuff you wrap around it. The boxes of the reporters hang! Read the bug! Please! They would still hang. As I repeatedly said for the benefit of two people who don't seem to be able to read source code, the loop is detected and terminated. So it already fails the open when it sees it has gotten five layers deep. There is no cheap way out of the problem, it's a kernel bug, and we will fix it - you may just delay it with your zero arguments. Oh I see. Allow me to explain your position in the words of some small children I know ME!! ME!! ME!! ME!! ME!! I don't care about your obscure corner-case non bug that in fact was a crypto bug combined with a modprobe bug and where the crypto bug is now fixed. I do care about not breaking existing users systems. The fact we do this is why Linux doesn't suck. It's not obscure, it's obviously broken. And it currently sucks. I have no idea why are you repeatedly, with completely wrong arguments trying, to explain hotplug stuff to me. I maintain it by the way, and I didn't see you involved here in the last years, so I guess you miss the background to understand the problem. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Sun, Dec 7, 2008 at 21:00, Alan Cox [EMAIL PROTECTED] wrote: max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT); atomic_inc(kmod_concurrent); if (atomic_read(kmod_concurrent) max_modprobes) { /* We may be blaming an innocent here, but unlikely */ if (kmod_loop_msg++ 5) printk(KERN_ERR request_module: runaway loop modprobe %s\n, module_name); atomic_dec(kmod_concurrent); return -ENOMEM; } Happy now. Print it out, share it with friends, find someone who can read C if you are stuck. It does not work, that's all. It works for me. It runs in a loop here, just like the $subject says, and bko#12153 says. Reproduce the bug and look at it for yourself. Well since you've got a reproducer and this code works for me (I've tested it just fine), why don't you go and reproduce the problem then post a fix to that code I quoted instead of all this reordering rubbish. If you fix this code not only won't you risk all the mess from re-ordering initialisations around the kernel but you'll fix non console related looping which you imply is also broken as you claim that code doesn't work for you. If I deliberately break my module utils I see a sequence of modprobes which then hits kmod_concurrent limit then causes a -ENOMEM back to userspace which then fails the file open. The bug report also shows the printk is displayed so the runaway loop *was* detected and the code paths taken which stopped the loop. Sure, I can try to find out why the limiter does does not work here. I get open - modprobe - open - modprobe - open - modprobe ... - open fail, then open fail, open fail, open fail, open fail back to the first modprobe exiting. Your proposal to keep the current recent modprobe parameter strings would shorten the amount of recursion but it wouldn't change the result that I can see. If I open /dev/console early and wrongly from a modprobe then I ultimately get a failing open just as I should do. No, my proposal would prevent the kernel to call any modprobe for the special built-in device 5:1, it shortens the recursion to zero, which sounds like the right fix. It's really weird to fix the symptoms, instead of the real problem. The kernel forks a binary with a broken environment. Even the in-kernel-tree cpio generator creates /dev/console, and accessing it from a kernel forked binary makes it crash. The kernel must provide a sane environment for stuff that it calls, I think, that is pretty obvious. Device 5:1 is a core device, which never makes sense to call modprobe for it. No other later driver will ever register that dev_t, so we should do it before calling out to other random userspace stuff which triggers the kernel to go crazy with its own devices. My proposal was to connect 5:1 to the kobject map at the same time as we register the whole tty class. We allocate the class, create sysfs entries, run /sbin/modprobe at that time, why shouldn't we just register the dev_t that time too? It properly prevents the needless userspace driver searching for a well known, already created, but not properly registered core device. It makes the process environment sane, and prevents the root cause of the bug, and does not just limit the damage. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Runaway loop with the current git.
On Mon, Dec 8, 2008 at 04:23, [EMAIL PROTECTED] wrote: On Sun, 07 Dec 2008 19:22:41 +0100, Kay Sievers said: We are _not_ loading a console driver that way, we try to load the console device itself, not the driver. There is no driver for 5:1 in any module. [childish blurb removed] It doesn't really matter if it's a console driver or the console device itself. If you're in modprobe loading *any* piece of all the stuff needed to make open(/dev/console) work, the last thing you want to be doing is opening /dev/console to complain about something not working. Nothing in initramfs or userspace tries to load all the stuff needed to make open(/dev/console) work. It's the kernel itself, that tries to resolve its own requirements. The kernel forked binary _writes_ to /dev/console, if we like it or not, it seems to do that, and it's arguable why it should not, or how it should detect that it is not allowed to do that. You may just depend on the logging of binaries to find other bugs. The helper was called in the first place for something else, in this case the cryptomgr. The loop is caused entirely by the kernel itself, if /dev/console is just accessed. There is no intentional loading of any console driver from userspace happening here. Kay -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html