My posting has not appeared on debian-{boot,kernel,user}. I think it is because of the attachments. I have removed them. I'll send the screen images to people individually if they request them. ------------------------------------------------------------------------------- I am cross posting this to debian-{boot,kernel,user}. I had replied to a reply to my original post on debian-{boot,kernel} with a to: to the replier and a cc: to debian-{boot,kernel} apparently it didn't get posted. So I am reposting this there. And I am posting this on debian-user to provide more information to all of the responders to my post there. My original post was short, just to raise the issue. This post is longer, to provide all of the details that I have.
Thanks to everyone for your help. Some background. I have 23 machines. 11 Dell T5500 each has 4 disks 4 HP DL165 each has 3 disks 4 Dell Poweredge R815 each has 6 disks 4 Dell Poweredge C6145 each has 4 disks All were purchased around 2011. All have been running wheezy reliably for years and running squeeze reliably for years before that. The initial install about 5 years ago was squeeze, with the squeeze installer. And then a dist-upgrade to wheezy a few years later. All machines within a class have the same hardware and have their disks partitoned identically. The disks were partitioned at the time of the initial install of squeeze about five years ago by the squeeze installer. All the machines have SATA disks but different classes of machines have different numbers of disks of different sizes. The disks on the T5500s and C6145s are the same. Dell T5500 sd[a-d]1 md0 RAID1 ext4 / sd[a-d]2 md1 RAID5 ext4 /aux sd[a-d]3 swap DL165 sd[a-c]1 md0 RAID1 ext3 / sd[a-c]2 md1 RAID5 ext3 /aux sd[a-c]3 swap R815 sd[a-f]1 md0 RAID1 ext3 / sd[a-f]2 md1 RAID5 ext3 /aux sd[a-f]3 swap C6145 sd[a-d]1 md0 RAID1 ext3 / sd[a-d]2 md1 RAID5 ext3 /aux sd[a-d]3 swap The reason that the T5500s have ext4 and the others do not is that the machines were purchased at slightly different times and ext4 became available. I first tried to do a dist-upgrade from wheezy to jessie one one machine of each class. But the dist-upgrade hung on 3 of the 4 machine types. I didn't save the details from that. But what I decided to do was a fresh install on one machine of each class. That fresh install succeeded on the T5500, the DL165, and the C6145. So I upgraded all of the T5500s, all of the DL165s, and all of the C6145s with a fresh install of jessie. That was successfull. There was (and still is) a minor issue with the C6145s. I will discuss that later. But the attempted fresh install to one R815 has not been successful. For the fresh installs, I am using the jessie installer on USB, built as described below. I attempt to preserve the existing disk partitioning. I also attempt to preserve the existing md1 /aux. These are my long-term data storage and collectively have about 100 terabytes of data. I reformat md0 /, keeping it as ext3 on the DL165s, R815s, and C6145s and keeping it as ext4 on the T5500s. On the R815, I first tried to do a fresh install from USB. (That was after the unsuccessful attempt at a dist-upgrade from a wheezy installation that had been running for years.) I tried that about 8 times, all unsuccessful. But it fails in slightly different ways each time. That nondeterministic behavior, described below, leads me to believe that there is a bug. After that, I tried unsuccessfully to boot from a live wheezy. (See my other posts to debian-user.) After that, I was successful in doing a fresh install of wheezy. That install was a minimal install. I did nothing but the fresh install from USB and I deselected all of the options for additional software to install. After that minimal install of wheezy, all I did was: nano /etc/apt/sources.list (change all wheezy to jessie) apt-get update apt-get dist-upgrade (answer default to all questions) /sbin/reboot The dist-upgrade did not complain and did not give any errors. But upon reboot, it entered the initramfs. A screen picture is enclosed below. I am only posting the part below because it has not previously been posted. To the readers of debian-users, there have been posts to debian-{boot,kernel} that may answer some of your questions and provide more information. I am not reposting those. Likewise, to the readers of debian-{boot,kernel}, there have been posts to debian-user that may answer some of your questions and provide more information. I am not reposting those. From: deloptes <delop...@gmail.com> I failed today to upgrade wheezy to jessie on raided system as well. Please note that all of the above systems have / as md0 RAID1. The fresh install of jessie was successfull on all but the R815s. -------------------------------------------------------------------------------- > Then it fails to reboot and goes into the initramfs. I have a picture of > the screen if anybody wishes. Yes please. Also please use the 'rescue' boot option which enables more verbose logging to the screen. Thanks for your help. Here is a screen picture. This is after (a) a fresh install of wheezy followed by (b) an apt-get dist-upgrade to jessie followed by (c) /sbin/reboot. The above picture was taken before your email. I have since reinstalled a fresh wheezy. I can redo the apt-get dist-upgrade to jessie and reboot with the rescue boot option and take a new picture if you wish. But before I do so, please let me know what else you would like me to do as part of the same experiment. The experiment will take several hours (including the subsequent reinstall of a fresh wheezy). So let's maximize the amount of information gain with this experiment. I conjecture that the jessie kernel has difficulty accessing the MD array on disk. The same problem occurs when I attempt a direct fresh install of jessie with the installer. The machine has six disks, all ST9500530NS SATA. These have about 500GB each. They all are partitioned identically with three partitions. sd[a-f]1 is RAID1 md0 ext3 mounted as /. sd[a-f]2 is RAID5 md1 ext3 mounted as /aux. sd[a-f]3 is swap. Enclosed below is the output of fdisk on one disk. It is not from the particular machine in question because that machine is not currently on the net and I am offsite. But it is from another R815 purchased at the same time that is running wheezy. All six disks on all four R815s are partitioned identically. I partitioned them only once when I did a fresh install of squeeze (with the squeeze installer) when I purchased the machines in about 2011. When I fresh install either wheezy or jessie, I keep md1 and reformat md0. When I apt-get dist-upgrade from wheezy to jessie, there is no reformat. Here is what happens that is strange. When I do a fresh install of jessie, one of the first things that the installer does is probe for hardware to try to find the ISO. I have done this about 10 times. Sometimes (about 3 or 4) it succeeds in finding the ISO. Sometimes (the rest) it comes up with a red screen and claims that it can't find the ISO. In all cases, I am booting the installer from the same USB dongle with the same data on it. I made the dongle as follows: # cd /tmp # wget http://ftp.nl.debian.org/debian/dists/jessie/main/installer-amd64/current/images/hd-media/boot.img.gz # wget http://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/8.5.0+nonfree/amd64/iso-cd/firmware-8.5.0-amd64-netinst.iso # zcat boot.img.gz >/dev/sdf # mount /dev/sdf /mnt # cp firmware-8.5.0-amd64-netinst.iso /mnt/. # umount /mnt (I actually have two such dongles, identical brand and size, with identical data installed on them by the above. Sometime I use one and sometimes the other.) When it does find the ISO, it proceeds through the entire install without issue until it gets to installing grub. Below are the answers that I give to the installer. Somewhere in there, I forget exactly where but before the network configuration, it asks which network device to use. The R815 has 4 identical ethernet ports. I select: eth0: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet When if gets to installing grub, I switch to ctrl-alt-f2 and type cat /proc/mdstat Every time so far, md1 has all 6 components. But md0 has only some of the components, sometimes 5/6, sometimes 4/6, and sometimes 1/6. And every time it is a different set of components. Even though, just a few minutes earlier, I was running wheezy and md0 had all 6 components. I do mdadm /dev/md0 --add <each of the missing components one by one> but it refuses. I forget the error. If I redo a fresh wheezy install after a failed jessie install, I get to the same place and do the same thing and it does successfully add the missing components. I wait about a half an hour and the array is successfully rebuilt. I then do chroot target grub-install /dev/sda ... grub-install /dev/sdf and it works. But if I attempt the grub-install in the jessie installer it refuses. I forget the error. In the jessie installer, no matter what I try, md0 has missing components, I can't add them, and I can't install grub. If I go back to ctrl-alt-f1, it asks what device to install grub to. I select sda. And I get a red screen that says something like Unable to install GRUB in /dev/sda Executing 'grub-install /dev/sda' failed. This is a fatal error. If I look at ctrl-alt-f4, there are messages about unable to read block 2048 or 2052 or 2056 on dev/sd[a-f]. But there is no hardware problem. Because right after this, I redo a fresh reinstall of wheezy from USB, rebuild md0 as part of the process, install grub on all 6 drives as part of the process, and everything works. It is not just the jessie installer. If I do a fresh install of wheezy and get a fully working wheezy with all six components of md0 and grub installed on all 6 drives, and all I do is an apt-get dist-upgrade to jessie, I get no errors during the upgrade. And after the upgrade, before reboot, all 6 components of md0 are there. (That is still running the wheezy kernel.) All I do is /sbin/reboot and then it comes up in the initfs. And if I then do a fresh reinstall of wheezy, I need to rebuild md0. So it seems to me that something in the jessie kernel is broken, probably related to the disk driver. Also note that I upgraded to the latest BIOS. But the same exact problems occurred both before the BIOS upgrade and after. booting jessie also takes hours to do systemd > configuration of the network FYI, here is a screen picture where it takes minutes for systemd to bring up the network. Note that I am not using DHCP. As per the enclosed, each host has a fixed IPv4 address. There are fixed DNS servers. I am at a university and IT services maintains the network for thousands of machines. I do not observe issues bringing up the network when running wheezy. Jeff (http://engineering.purdue.edu/~qobi) -------------------------------------------------------------------------------- default Install default English default United States default American English Go Back default Configure network manually 128.46.115.211 default netmask default gateway 128.210.11.57 128.210.11.5 128.46.154.76 default hostname default domain name root password root password Jeffrey Mark Siskind qobi password password default Eastern Manual RAID1 #1 Ext3 journaling file system Format the partition: yes, format it Mount point: / Done setting up the partition RAID5 #1 Ext3 journaling file system default Format the partition: no, keep existing data Mount point: /aux Done setting up the partition Finish partitioning and write changes to disk Yes default United States default ftp.us.debian.org default blank Yes uncheck all Yes /dev/sda Continue ------------------------------------------------------------------------------- Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000080 Device Boot Start End Blocks Id System /dev/sda1 * 2048 78319615 39158784 fd Linux raid autodetect /dev/sda2 78319616 859570175 390625280 fd Linux raid autodetect /dev/sda3 859570176 976771071 58600448 82 Linux swap / Solaris