Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
Hi, So the 'fix' is applied directly to the host os, is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192 Keith On 14 Feb 2011, at 10:36, Kwan Lowe wrote: On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams awill...@whitemice.org wrote: em and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. You made this change to the *virtual machine* [not the host OS]? This thread indicates this was with VMware Workstation and not ESX (correct)? This was done on the CentOS and RHEL guests on VMWare ESX hosts. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote: Hi, So the 'fix' is applied directly to the host os, no, to the *guest* OS instances. [please, do not top-post]. is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192 No space(s) I believe. sysctl -w vm.min_free_kbytes=8192 I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity. On Jun 20, 2007 in the aforementioned thread there is the comment: RHEL5 still needs a fix as well, and since it's not yet officially supported from VMware for ESX my guess is it won't get a formal fix until it is certified. I plan to post a patched driver for RHEL5 on my website in the next day or so. - but the comment is from *2007* and RHEL5 is now certified. http://communities.vmware.com/message/881727#881727 seems like an update that describes my issue; but even that is from 2008. Reference: VMware KB#1001778 (Note: RHEL5U1 is long since released) On 14 Feb 2011, at 10:36, Kwan Lowe wrote: On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams awill...@whitemice.org wrote: em and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. You made this change to the *virtual machine* [not the host OS]? This thread indicates this was with VMware Workstation and not ESX (correct)? This was done on the CentOS and RHEL guests on VMWare ESX hosts. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams awill...@whitemice.org wrote: On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote: Hi, So the 'fix' is applied directly to the host os, no, to the *guest* OS instances. [please, do not top-post]. is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192 No space(s) I believe. sysctl -w vm.min_free_kbytes=8192 I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity. It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On 02/14/2011 07:31 AM, Kwan Lowe wrote: On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams awill...@whitemice.org wrote: On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote: Hi, So the 'fix' is applied directly to the host os, no, to the *guest* OS instances. [please, do not top-post]. is this the correct thing to do? sysctl -w vm.min_free_kbytes = 8192 No space(s) I believe. sysctl -w vm.min_free_kbytes=8192 I'm still not entirely clear as to why this setting should/will make a difference in maintaining filesystem integrity. It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up. I use that setting on the Host OS for VMWare to prevent a whole vm from getting killed. That setting will maintain a minimum amount of free memory available to prevent a large program that requests memory quick from depleting all available memory and causing the program killer from killing the highest RAM process. If you are on a Host OS box, the biggest Memory processes are your VMs, and getting one killed off because memory reaches zero is not good. I don't have any idea how it would fix journal errors on a drive, but I guess it could. I set it much higher than 8192 on the host machines ... I set it to 131072. signature.asc Description: OpenPGP digital signature ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
It's certainly possible that the error I was receiving was a different reason, though similar symptoms. We started seeing filesystems go read-only, and only rebooting would clear it up. I use that setting on the Host OS for VMWare to prevent a whole vm from getting killed. That setting will maintain a minimum amount of free memory available to prevent a large program that requests memory quick from depleting all available memory and causing the program killer from killing the highest RAM process. If you are on a Host OS box, the biggest Memory processes are your VMs, and getting one killed off because memory reaches zero is not good. I don't have any idea how it would fix journal errors on a drive, but I guess it could. It's been a few years since I put in the tuning, but here's some info that might be useful: http://communities.vmware.com/thread/20690?start=0tstart=0 In particular, others had reported seeing this error: kernel: journal_get_undo_access: No memory for committed data. I don't recall that error in my case, but might explain why the tuning fixed the problem. There's a bugzilla for this: https://bugzilla.redhat.com/show_bug.cgi?id=179605 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams awill...@whitemice.org wrote: On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote: Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue? I guess we'll see [this issue has become extremely frustrating]. I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh]. I am experiencing the same issue. cent: current exsi v3.5 update 5 storage nfs I am in the process of rebuilding the virtual server using a different os thinking it was just file system errors. -bazooka ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Mon, 2011-02-14 at 13:01 -0800, Bazooka Joe wrote: On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams awill...@whitemice.org wrote: On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote: Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue? I guess we'll see [this issue has become extremely frustrating]. I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh]. I am experiencing the same issue. cent: current exsi v3.5 update 5 storage nfs I am in the process of rebuilding the virtual server using a different os thinking it was just file system errors. What other OS? I've experienced one [possibly unrelated] corruption of the /tmp filesystem on an openSUSE 11.1 VM. So far Windows VMs seem immune to the issue. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. I've found - https://bugzilla.redhat.com/show_bug.cgi?id=228108 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=51306 - which seem related but I believe I am running a kernel that contains these fixes. My kernel is 2.6.18-194.32.1.el5 on one of the most effected hosts. Does anyone else have experience with similar issues or know of the status of this Bug/KB? I can install, boot, run, and then at some seemingly random moment - init_special_inode: bogus i_mode (50632) init_special_inode: bogus i_mode (137147) init_special_inode: bogus i_mode (172036) init_special_inode: bogus i_mode (175720) init_special_inode: bogus i_mode (72350) init_special_inode: bogus i_mode (174751) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698169 in dir #19696695 Aborting journal on device sdb2. init_special_inode: bogus i_mode (165661) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698131 in dir #19696695 init_special_inode: bogus i_mode (76763) init_special_inode: bogus i_mode (3116) init_special_inode: bogus i_mode (75363) init_special_inode: bogus i_mode (77034) init_special_inode: bogus i_mode (132237) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698139 in dir #19696695 init_special_inode: bogus i_mode (53031) init_special_inode: bogus i_mode (33361) init_special_inode: bogus i_mode (77546) init_special_inode: bogus i_mode (6516) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698143 in dir #19696695 init_special_inode: bogus i_mode (6442) init_special_inode: bogus i_mode (72554) EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698142 in dir #19696695 EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698164 in dir #19696695 init_special_inode: bogus i_mode (73171) init_special_inode: bogus i_mode (154432) init_special_inode: bogus i_mode (34302) init_special_inode: bogus i_mode (131733) init_special_inode: bogus i_mode (30773) ext3_abort called. EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awill...@whitemice.org wrote: I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. I've found - https://bugzilla.redhat.com/show_bug.cgi?id=228108 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=51306 - which seem related but I believe I am running a kernel that contains these fixes. I ran into a similar problem, but it was not specifically iSCSI. We ended up setting a sysctl.conf file. Give me a few and I will find the setting.. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awill...@whitemice.org wrote: I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
Hi Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes value apply CentOS v.5.4 and 5.5 as well to resolve the issue? On 13 Feb 2011, at 14:40, Kwan Lowe kwan.l...@gmail.com wrote: On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awill...@whitemice.org wrote: I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Sun, 2011-02-13 at 09:40 -0500, Kwan Lowe wrote: On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awill...@whitemice.org wrote: I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. You made this change to the *virtual machine* [not the host OS]? This thread indicates this was with VMware Workstation and not ESX (correct)? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)
On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote: Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, according the the VMware knowledge base article this should have been resolved in v5.1 update??. Does changing the vm.min_free_kbytes valu apply CentOS v.5.4 and 5.5 as well to resolve the issue? I guess we'll see [this issue has become extremely frustrating]. I suppose it is 'good' to see that someone else sees the issue as well. One issue with virtualization is that debugging these types of issues is an order-of-magnitude more difficult [virtualized OS, virtualized storage, virtualization platform, or some interaction of all the above... ugh]. On 13 Feb 2011, at 14:40, Kwan Lowe kwan.l...@gmail.com wrote: On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams awill...@whitemice.org wrote: I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment using iSCSI storage. Recently we've begun to experience journal aborts resulting in remounted-read-only filesystems as well as other filesystem issues - I can unmount a filesystem and force a check with fsck -f and occasionally find errors. http://communities.vmware.com/message/245983 The setting we used to resolve was vm.min_free_kbytes = 8192 Previous to this we were seeing the error pop up every week or so. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos