On Sat, May 3, 2014 at 11:49 AM, Chapman Flack <[email protected]> wrote: > Hi, > > I've had a PXE net-install setup that has worked very well for a while. > You can plug any box into our network and boot from network, and you get > a grub menu with a bunch of generally useful options (memtest86+ for > anybody, no password, sysresccd with a password, etc.), plus if our DHCP > server knows what OS and configuration are supposed to be on that box, > there will also be a submenu with rescue, prompted install, and saved-answer > install options custom to that box. All this is just background to give > an indication of how simple/complicated our grub.cfg is. Really it's > not bad. It's written by hand (not the usual assembled-from-pieces-in-/etc), > and about 90 lines long. It defines a few functions, and it uses > net_get_dhcp_option to query a site-local option the DHCP server sends > to say what should be on the box. If that string is, say, > "RHEL x86_64 6 5 workstation" then the main grub.cfg will source > $prefix/RHEL and expect it to define a function RHEL that can be > called with the remaining arguments, and that function creates the > submenu with the right menuentries for booting the OS installer with > the right cmdline arguments. The RHEL script, for example, is another > 45 lines. Nothing very big at all. > > This whole setup works great on every other box I've used it on, but we > just bought some brand new Dell Precision T5610 workstations, and the > behavior is really crazy. Usually it loads grub and parses the scripts > ok, and puts up the correct menu, but try to load any of the choices and > it just either hangs or reboots on the first kernel load command (linux, > knetbsd, whatever). Sometimes it won't accept the pbkdf2 password at all, > or will say command not found even for something built-in like reboot. > > On occasions when it does accept the password and I can get to the command > line, it will hang on even something simple like > testload ($root)/memtest86+.elf > > It is acting very much like something in the grub-script code is stomping > on memory somewhere. As a test, I moved most of grub.cfg into grub.normal, > and made a very short grub.cfg: > > set superusers="..." > > password_pbkdf2 root grub.pbkdf2.sha512.... > > echo -n 'm for minimal: ' > read min > echo > if [ 'm' != "$min" ] > then > source "$prefix/grub.normal" > fi > > With that, I can boot and enter m and it skips all the rest of the script. > At the command line I can directly enter the same lines that were otherwise > hanging.... > > testload ($root)/memtest86+.elf ...works fine > knetbsd ($root)/memtest86+.elf > boot ...gives me a memtest just fine > > and so on. So as long as it hasn't run my other 130 lines of script yet, > apparently nothing has been stomped on yet. And there's really nothing > at all fancy in the script - some function definitions, variable assigments > and uses, uses of setparams and shift, and the one use of net_dhcp_get_option. > > And why am I seeing this problem on these 5610s and not on other boxes? > Do different BIOSes leave grub with really different amounts of working > memory or something? Are there any grub commands I can use to see the > memory stats or anything else that might help pin down where the problem > is? > > The 5610s have 32 GB of RAM each, and are set for PXE boot in legacy BIOS > mode. I have played with some BIOS settings that seemed like they might > affect the memory available to grub, but with no improvement. I even pulled > some DIMMs just to see if it might work better with less total RAM (32 GB > is more than most of our other boxes where I don't see this problem). Also > no improvement. > > Any hints on how to further troubleshoot this? And is there a way to build > grub to give errors if something in the script code is clobbering memory > or whatever, instead of just seeming to work until the machine hangs or > reboots later? > > Thanks, > Chapman Flack
Please file a bug report about this with: The exact version of grub you're using (including the distribution package's version if you're using a binary distribution, or your build log if you're building from source). The full scripts that you're using, including a minimal grub.cfg that will reproduce the problem (at least on the problematic hardware). If you can, please also include a serial log with "debug=all", though the developers will likely be able to give you a more targeted debug value that will produce less output. -- Jordan Uggla (Jordan_U on irc.freenode.net) _______________________________________________ Help-grub mailing list [email protected] https://lists.gnu.org/mailman/listinfo/help-grub
