Re: [rhelv5-list] Kernel panic after upgrade
As I understand it, 2.6.39uek = 3.0.16. The versioning is for their tool compatibility, even though it should be 2.6.40.16 under the commonly accepted versioning for 2.6.x IIRC. And allegedly Xen went upstream early in 3.x. Was it 3.0.3? Or was that 3.3? I think it was in 3.0. But it lacked some backend drivers (xenblk backend). I've solved compiling 3.2.28 with the spec from kernel-ml (thanks guys) and changing kernel config file to include xen backend drivers. I still missing some minor issues like renaming it to 2.6.42 so that it remains in 2.6 namespace but bottom line is that everything works now (TM) In any case, if you're looking for features, why don't you consider Fedora? If you're cycling updates in every year or quicker, which sounds like the case, it makes far more sense. Fedora is quite stable, but just rebases far more, instead of backporting. If it's just a virtualization platform, and not a run-time platform for applications, it completely makes far more sense to consider Fedora if you're focused on features. They already do the integration testing for you. Xen dom0 was added back to Fedora as of Fedora 16. [1] Yes, we've considered that but with fedora we only have security updates for 2 releases back while EL has a few years. Also our hypervisor/dom0 linux is installed in a server and an upgrade between major fedora releases could be dangerous/problematic. To solve this issue we are probably going the way RHEV and Ovirt are going by using a squashfs FS that can be swapped with a new one when updates or upgrades are released. In this scenario we could use Fedora. I'm still in need of an unionfs that works (in recent kernels) and are stable (tried unionfs, aufs and still waiting on overlayfs). Thanks for all the help Nuno Fernandes___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On Thu, Aug 30, 2012 at 4:55 AM, Nuno Fernandes npf-mli...@eurotux.com wrote: Hello, I have a server with centos5 installed and 2.6.18-308.4.1.el5xen kernel with the following modules ... The important part is the mptsas module that is being used because of the ... local disc of the server ... I've then upgraded to a more recent kernel (rebuild from source from http://au1.mirror.crc.id.au/repo/SRPMS/). The kernel is kernel-xen-2.6.32.57-2. There is no guarantee that a more recent, unsupported kernel like 2.6.32 will work on an userspace built around 2.6.18, especially if you are making use of the Xen Hypervisor and other components. There can be some breakable modules when you're running 2.6.32, 3.0, etc... on the EL5/2.6.18 kABI ;) The kernel boots fine and initrd kicks in. It loads mptsas module and detects the local harddrive but no partition appears on /proc/partitions. Because of that it can't mount root partition and panics. Does anyone know any issue with mptsas or a way to debug the problem? How did you verify the mpt* stack actually loads if it kernel panics? Did you see it scroll past? Did you try booting the old kernel, and breaking out the initrd (e.g., gzip -cd initrd.img | cpio -imdv) ? It probably wouldn't hurt to do some comparison of the init and modules between your working 2.6.18 and the problematic 2.6.32. -- Bryan J Smith - Professional, Technical Annoyance ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On Thursday 30 August 2012 06:25:52 Bryan J Smith wrote: On Thu, Aug 30, 2012 at 4:55 AM, Nuno Fernandes npf-mli...@eurotux.com wrote: Hello, I have a server with centos5 installed and 2.6.18-308.4.1.el5xen kernel with the following modules ... The important part is the mptsas module that is being used because of the ... local disc of the server ... I've then upgraded to a more recent kernel (rebuild from source from http://au1.mirror.crc.id.au/repo/SRPMS/). The kernel is kernel-xen-2.6.32.57-2. There is no guarantee that a more recent, unsupported kernel like 2.6.32 will work on an userspace built around 2.6.18, especially if you are making use of the Xen Hypervisor and other components. There can be some breakable modules when you're running 2.6.32, 3.0, etc... on the EL5/2.6.18 kABI ;) Yes.. that coud happen. I've upgraded some packages also (mkinitrd and udev) and they can be the origin of the problem. I'm still on the process of figuring why. The kernel boots fine and initrd kicks in. It loads mptsas module and detects the local harddrive but no partition appears on /proc/partitions. Because of that it can't mount root partition and panics. Does anyone know any issue with mptsas or a way to debug the problem? How did you verify the mpt* stack actually loads if it kernel panics? Did you see it scroll past? Did you try booting the old kernel, and breaking out the initrd (e.g., gzip -cd initrd.img | cpio -imdv) ? It probably wouldn't hurt to do some comparison of the init and modules between your working 2.6.18 and the problematic 2.6.32. To find the problem i've changed the initrd of 2.6.32 to include bash and a few other tools (mknod, dmesg, ls, cat etc). Because i don't have ssh to it you can find the images on: http://troll.ws/image/b4003dd8 http://troll.ws/image/201f0253 The images are from 2.6.39uek kernel (i was trying with that kernel because Oracle creates packages for el5 and to see if a newer kernel worked). What can i do to debug this? Thanks, Nuno Fernandes -- Bryan J Smith - Professional, Technical Annoyance ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
Nuno Fernandes npf-mli...@eurotux.com wrote: Yes.. that coud happen. Maybe I should step back ... What are your _requirements_? I.e., what is driving you to do this? What is preventing you from going with an actual, full EL6 platform (with the Red Hat 2.6.32 kABI) instead of attempting to mix'n match on an EL5 platform (with the Red Hat 2.6.18 kABI)? I've upgraded some packages also (mkinitrd and udev) and they can be the origin of the problem. I'm still on the process of figuring why. And where did you get them from? Is this mix'n match documented somewhere? To find the problem i've changed the initrd of 2.6.32 to include bash and a few other tools (mknod, dmesg, ls, cat etc). Because i don't have ssh to it you can find the images on: http://troll.ws/image/b4003dd8 http://troll.ws/image/201f0253 I'd look, but I have a much greater concern now. I.e., even if you get it to boot, what sort of unit tested non-EL5 kABI do we now have? That's before we even visit the EL5 integration and regression testing variables. And what happens when you attempt further EL5 platform updates? The images are from 2.6.39uek kernel (i was trying with that kernel because Oracle creates packages for el5 and to see if a newer kernel worked). Wait. Now hold on ... You're trying to use a 2.6.39uek kernel? toolset? With a 2.6.32 kernel? (even before we consider this is still EL5, an 2.6.18 kABI designed platform) And with which package replacements from where? And is this documented anywhere? Now I cannot answer about the compatibility issues with community and/or community releases downstream from Red Hat Enterprise Linux that mix'n match what they want, and are not even built to be 1:1 ABI/API compatible. I.e., the last time I checked, Oracle does not publicly release the binary Unbreakable Enterprise Kernel (UEK) or any support packages that differ, but only their claimed Red Hat compatible kernels and support packages (that they claim to be 1:1 ABI/API compatible). Which is why I need a full accounting of where you are getting everything and from what, what versions, etc... Otherwise I'm shooting completely in-the-dark. What can i do to debug this? You could start by giving _detailed_ answers on: A) Your actual requirements (what is driving this), and B) Where you are getting _everything_ (not just kernel, but mkinitrd, udev, etc... replacements) I have absolutely no benchmark to start from with this level of information (total lackthereof). I cannot even advise if they are remotely compatible. -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On 08/30/2012 12:51 PM, Bryan J Smith wrote: Nuno Fernandes npf-mli...@eurotux.com wrote: Yes.. that coud happen. Maybe I should step back ... What are your _requirements_? I.e., what is driving you to do this? What is preventing you from going with an actual, full EL6 platform (with the Red Hat 2.6.32 kABI) instead of attempting to mix'n match on an EL5 platform (with the Red Hat 2.6.18 kABI)? I've upgraded some packages also (mkinitrd and udev) and they can be the origin of the problem. I'm still on the process of figuring why. And where did you get them from? Is this mix'n match documented somewhere? To find the problem i've changed the initrd of 2.6.32 to include bash and a few other tools (mknod, dmesg, ls, cat etc). Because i don't have ssh to it you can find the images on: http://troll.ws/image/b4003dd8 http://troll.ws/image/201f0253 I'd look, but I have a much greater concern now. I.e., even if you get it to boot, what sort of unit tested non-EL5 kABI do we now have? That's before we even visit the EL5 integration and regression testing variables. And what happens when you attempt further EL5 platform updates? The images are from 2.6.39uek kernel (i was trying with that kernel because Oracle creates packages for el5 and to see if a newer kernel worked). Wait. Now hold on ... You're trying to use a 2.6.39uek kernel? toolset? With a 2.6.32 kernel? (even before we consider this is still EL5, an 2.6.18 kABI designed platform) And with which package replacements from where? And is this documented anywhere? Now I cannot answer about the compatibility issues with community and/or community releases downstream from Red Hat Enterprise Linux that mix'n match what they want, and are not even built to be 1:1 ABI/API compatible. I.e., the last time I checked, Oracle does not publicly release the binary Unbreakable Enterprise Kernel (UEK) or any support packages that differ, but only their claimed Red Hat compatible kernels and support packages (that they claim to be 1:1 ABI/API compatible). Which is why I need a full accounting of where you are getting everything and from what, what versions, etc... Otherwise I'm shooting completely in-the-dark. What can i do to debug this? You could start by giving _detailed_ answers on: A) Your actual requirements (what is driving this), and B) Where you are getting _everything_ (not just kernel, but mkinitrd, udev, etc... replacements) I have absolutely no benchmark to start from with this level of information (total lackthereof). I cannot even advise if they are remotely compatible. -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list Let me then start from the beginning... We are creating an opensource virtualization solution called Nuxis (http://nuxis.com/). This solution is based on Centos5 + some of our packages (mostly web interface and libvirt connector (GPLv2)). We support KVM and Xen. I our local setup we've upgraded xen to a more recent version (xen-4.1.3-1.el5) because of some security issues. We were using xen 4.1.2. We've also upgrade some userland packages, so kernel/KABI remains in 2.6.18. For xen packages we used the gitco repository (http://www.gitco.de/repo/) that provides xen replacements and updates for centos5. After the recent upgrade we are having problems starting virtual machines (yes, i can ask in xen-users Mailling list) but i think that the problem is in Dom0 kernel because we see logs of the type: xen be core: xen be core: can't open gnttab device And googling it points to an old Dom0 kernel. To debug this problem i tried different new kernels (2.6.32-xen and 2.6.39uek). I know that the change in kernel will change KABI but for debuging purposes i'm ok with that. To install those kernels i've installed the oracle linux repo that points to: # cat /etc/yum.repos.d/uek.repo [uek] name=Oracle Linux $releasever - U6 - $basearch - base baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL5/UEK/latest/$basearch/ gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5 gpgcheck=1 enabled=1 [ol5] name=Oracle Linux $releasever - U6 - $basearch - base baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL5/latest/$basearch/ gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5 gpgcheck=1 enabled=1 And installed kernel-uek: Aug 22 22:26:46 Updated: kpartx-0.4.9-46.0.5.el5.x86_64 Aug 22 22:26:47 Installed: device-mapper-multipath-libs-0.4.9-46.0.5.el5.x86_64 Aug 22 22:26:47 Updated: device-mapper-multipath-0.4.9-46.0.5.el5.x86_64 Aug 22 22:26:47 Installed: oraclelinux-release-5-8.0.2.x86_64 Aug 22 22:26:48 Installed: kernel-uek-firmware-2.6.39-100.5.1.el5uek.noarch Aug 22 22:26:48 Installed: 1:busybox-1.2.0-13.el5.centos.x86_64 Aug 22 22:26:48 Installed: kexec-tools-1.102pre-154.el5.x86_64 Aug 22
Re: [rhelv5-list] Kernel panic after upgrade
Since you're playing with third-party kernels on top of RHEL 5, why not try the ones from ELRepo? I've used their kernels on a RHEL 6 system to get bleeding-edge USB 3.0 support and they've been great on two systems (nothing production). I've not used their kernels for RHEL 5 (which have a few caveats), but it might help you out. http://elrepo.org/tiki/kernel-ml /Brian/ -- Brian Long | | Corporate Security Programs Org. | | | . | | | . ' ' C I S C O On Aug 30, 2012, at 8:20 AM, Nuno Fernandes wrote: On 08/30/2012 12:51 PM, Bryan J Smith wrote: Nuno Fernandes npf-mli...@eurotux.com wrote: Yes.. that coud happen. Maybe I should step back ... What are your _requirements_? I.e., what is driving you to do this? What is preventing you from going with an actual, full EL6 platform (with the Red Hat 2.6.32 kABI) instead of attempting to mix'n match on an EL5 platform (with the Red Hat 2.6.18 kABI)? I've upgraded some packages also (mkinitrd and udev) and they can be the origin of the problem. I'm still on the process of figuring why. And where did you get them from? Is this mix'n match documented somewhere? To find the problem i've changed the initrd of 2.6.32 to include bash and a few other tools (mknod, dmesg, ls, cat etc). Because i don't have ssh to it you can find the images on: http://troll.ws/image/b4003dd8 http://troll.ws/image/201f0253 I'd look, but I have a much greater concern now. I.e., even if you get it to boot, what sort of unit tested non-EL5 kABI do we now have? That's before we even visit the EL5 integration and regression testing variables. And what happens when you attempt further EL5 platform updates? The images are from 2.6.39uek kernel (i was trying with that kernel because Oracle creates packages for el5 and to see if a newer kernel worked). Wait. Now hold on ... You're trying to use a 2.6.39uek kernel? toolset? With a 2.6.32 kernel? (even before we consider this is still EL5, an 2.6.18 kABI designed platform) And with which package replacements from where? And is this documented anywhere? Now I cannot answer about the compatibility issues with community and/or community releases downstream from Red Hat Enterprise Linux that mix'n match what they want, and are not even built to be 1:1 ABI/API compatible. I.e., the last time I checked, Oracle does not publicly release the binary Unbreakable Enterprise Kernel (UEK) or any support packages that differ, but only their claimed Red Hat compatible kernels and support packages (that they claim to be 1:1 ABI/API compatible). Which is why I need a full accounting of where you are getting everything and from what, what versions, etc... Otherwise I'm shooting completely in-the-dark. What can i do to debug this? You could start by giving _detailed_ answers on: A) Your actual requirements (what is driving this), and B) Where you are getting _everything_ (not just kernel, but mkinitrd, udev, etc... replacements) I have absolutely no benchmark to start from with this level of information (total lackthereof). I cannot even advise if they are remotely compatible. -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list Let me then start from the beginning... We are creating an opensource virtualization solution called Nuxis (http://nuxis.com/). This solution is based on Centos5 + some of our packages (mostly web interface and libvirt connector (GPLv2)). We support KVM and Xen. I our local setup we've upgraded xen to a more recent version (xen-4.1.3-1.el5) because of some security issues. We were using xen 4.1.2. We've also upgrade some userland packages, so kernel/KABI remains in 2.6.18. For xen packages we used the gitco repository (http://www.gitco.de/repo/) that provides xen replacements and updates for centos5. After the recent upgrade we are having problems starting virtual machines (yes, i can ask in xen-users Mailling list) but i think that the problem is in Dom0 kernel because we see logs of the type: xen be core: xen be core: can't open gnttab device And googling it points to an old Dom0 kernel. To debug this problem i tried different new kernels (2.6.32-xen and 2.6.39uek). I know that the change in kernel will change KABI but for debuging purposes i'm ok with that. To install those kernels i've installed the oracle linux repo that points to: # cat /etc/yum.repos.d/uek.repo [uek] name=Oracle Linux $releasever - U6 - $basearch - base baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL5/UEK/latest/$basearch/ gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5
Re: [rhelv5-list] Kernel panic after upgrade
On Thu, Aug 30, 2012 at 8:20 AM, Nuno Fernandes npf-mli...@eurotux.com wrote: Let me then start from the beginning... We are creating an opensource virtualization solution called Nuxis (http://nuxis.com/). This solution is based on Centos5 + some of our packages (mostly web interface and libvirt connector (GPLv2)). We support KVM and Xen. I our local setup we've upgraded xen to a more recent version (xen-4.1.3-1.el5) because of some security issues. We were using xen 4.1.2. But not security issues with Red Hat's EL5 Xen, which has backported fixes. These are security issues with the version you have already rebased ahead to prior. I.e., you're now having to provide your own sustaining engineering, and deciding between rebasing and backporting, because you're not using Red Hat Xen. [ SIDE NOTE: And in that case, you're not really tied to an EL5 platform any more either, but that's another discussion. ] We've also upgrade some userland packages, so kernel/KABI remains in 2.6.18. For xen packages we used the gitco repository (http://www.gitco.de/repo/) that provides xen replacements and updates for centos5. After the recent upgrade we are having problems starting virtual machines (yes, i can ask in xen-users Mailling list) but i think that the problem is in Dom0 kernel because we see logs of the type: xen be core: xen be core: can't open gnttab device And googling it points to an old Dom0 kernel. To debug this problem i tried different new kernels (2.6.32-xen and 2.6.39uek). I know that the change in kernel will change KABI but for debuging purposes i'm ok with that. Since you mentioned debugging, what's the endgame here? I mean, if you're talking about backporting fixes ... ugh ... that's a lot of sustaining engineering. Or if you're going to rebase, at what point are you fighting platform changes with virtualization changes with unknown changes? To install those kernels i've installed the oracle linux repo that points to: # cat /etc/yum.repos.d/uek.repo ... Yeah, ignore my prior. Seems it changed as of March, although I'd still read through their FAQ on integration/support. Best regards and thanks for any debug ideas. Well, I mean, you're just starting into figuring out boot differences. At what point does the cost of sustaining engineering (whether you rebase or backport), as well as the periods during this engineering when you don't have a release with mitigated security vulnerabilities, end up costing more than going with a released platform that has a large engineering team behind it that is keeping up with Xen and/or KVM, security, etc... changes? And I'm not even trying to suggest anyone (e.g., if you want to rebase XeN). I mean, I could look at what your initrd issue is if you want to provide it. But that just gets you booting. Given your errors with the dom0, I think your issues might be much deeper than you realize. -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On 08/30/2012 01:49 PM, Bryan J Smith wrote: On Thu, Aug 30, 2012 at 8:20 AM, Nuno Fernandes npf-mli...@eurotux.com wrote: Let me then start from the beginning... We are creating an opensource virtualization solution called Nuxis (http://nuxis.com/). This solution is based on Centos5 + some of our packages (mostly web interface and libvirt connector (GPLv2)). We support KVM and Xen. I our local setup we've upgraded xen to a more recent version (xen-4.1.3-1.el5) because of some security issues. We were using xen 4.1.2. We were using xen 4.1.2 because of features. We've upgraded to 4.1.3 because of security fixes. Since you mentioned debugging, what's the endgame here? I mean, if you're talking about backporting fixes ... ugh ... that's a lot of sustaining engineering. Or if you're going to rebase, at what point are you fighting platform changes with virtualization changes with unknown changes? In most cases we are rebasing... only when rebasing leads to problems we try to backport. I mean, I could look at what your initrd issue is if you want to provide it. But that just gets you booting. Given your errors with the dom0, I think your issues might be much deeper than you realize. Yes.. thats the only thing that i want.. to get it booting :) Both initrd (2.6.18xen and 2.6.39uek) are in http://downloads.clientes.eurotux.com/noc/nuxis/temp/ If it boots (even without xen) i'll pursue the issue in xen-users mlists. Best regards, Nuno Fernandes ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On 08/30/2012 01:41 PM, Brian Long wrote: Since you're playing with third-party kernels on top of RHEL 5, why not try the ones from ELRepo? I've used their kernels on a RHEL 6 system to get bleeding-edge USB 3.0 support and they've been great on two systems (nothing production). I've not used their kernels for RHEL 5 (which have a few caveats), but it might help you out. http://elrepo.org/tiki/kernel-ml /Brian/ Hello, That kernel works and it boots fine without xen. With xen it panics because it doesn't have support for Dom0: shell$ grep CONFIG_XEN /boot/config-3.0.42-1.el5.elrepo # CONFIG_XEN is not set # CONFIG_XEN_PRIVILEGED_GUEST is not set I'll try to rebuild it with xen dom0 to test it. Best regards, Nuno Fernandes ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
That kernel works and it boots fine without xen. With xen it panics because it doesn't have support for Dom0: shell$ grep CONFIG_XEN /boot/config-3.0.42-1.el5.elrepo # CONFIG_XEN is not set # CONFIG_XEN_PRIVILEGED_GUEST is not set Argh!! 3.0 kernel lacks some xen backend drivers needed by full dom0 support. kernel-2.6.39uek has backported those drivers because of their oracleVM product that uses the same kernel. Back to debug mode :( Best regards, Nuno Fernandes ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list
Re: [rhelv5-list] Kernel panic after upgrade
On Thu, Aug 30, 2012 at 10:48 AM, Nuno Fernandes npf-mli...@eurotux.com wrote: Argh!! 3.0 kernel lacks some xen backend drivers needed by full dom0 support. kernel-2.6.39uek has backported those drivers because of their oracleVM product that uses the same kernel. Back to debug mode :( As I understand it, 2.6.39uek = 3.0.16. The versioning is for their tool compatibility, even though it should be 2.6.40.16 under the commonly accepted versioning for 2.6.x IIRC. And allegedly Xen went upstream early in 3.x. Was it 3.0.3? Or was that 3.3? In any case, if you're looking for features, why don't you consider Fedora? If you're cycling updates in every year or quicker, which sounds like the case, it makes far more sense. Fedora is quite stable, but just rebases far more, instead of backporting. If it's just a virtualization platform, and not a run-time platform for applications, it completely makes far more sense to consider Fedora if you're focused on features. They already do the integration testing for you. Xen dom0 was added back to Fedora as of Fedora 16. [1] -- bjs [1] wiki.xen.org/wiki/Fedora_Host_Installation -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith ___ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list