Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Keith Beeby
Hi,

So the 'fix' is applied directly to the host os, is this the correct thing to 
do?

sysctl -w vm.min_free_kbytes = 8192

Keith




On 14 Feb 2011, at 10:36, Kwan Lowe wrote:

 On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams
 awill...@whitemice.org wrote:
 em and force a check with fsck -f and
 occasionally find errors.
 http://communities.vmware.com/message/245983
 The setting we used to resolve was vm.min_free_kbytes = 8192
 Previous to this we were seeing the error pop up every week or so.
 
 You made this change to the *virtual machine* [not the host OS]?
 
 This thread indicates this was with VMware Workstation and not ESX
 (correct)?
 
 This was done on the CentOS and RHEL guests on VMWare ESX hosts.
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Adam Tauno Williams
On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote: 
 Hi,
 So the 'fix' is applied directly to the host os,

no, to the *guest* OS instances.  [please, do not top-post].

 is this the correct thing to do?
 sysctl -w vm.min_free_kbytes = 8192

No space(s) I believe.

sysctl -w vm.min_free_kbytes=8192

I'm still not entirely clear as to why this setting should/will make a
difference in maintaining filesystem integrity.

On Jun 20, 2007 in the aforementioned thread there is the comment:
RHEL5 still needs a fix as well, and since it's not yet officially
supported from VMware for ESX my guess is it won't get a formal fix
until it is certified.  I plan to post a patched driver for RHEL5 on my
website in the next day or so. - but the comment is from *2007* and
RHEL5 is now certified.

http://communities.vmware.com/message/881727#881727 seems like an
update that describes my issue; but even that is from 2008.

Reference: VMware KB#1001778 (Note: RHEL5U1 is long since released)

 On 14 Feb 2011, at 10:36, Kwan Lowe wrote:
  On Sun, Feb 13, 2011 at 7:00 PM, Adam Tauno Williams
  awill...@whitemice.org wrote:
  em and force a check with fsck -f and
  occasionally find errors.
  http://communities.vmware.com/message/245983
  The setting we used to resolve was vm.min_free_kbytes = 8192
  Previous to this we were seeing the error pop up every week or so.
  You made this change to the *virtual machine* [not the host OS]?
  This thread indicates this was with VMware Workstation and not ESX
  (correct)?
  This was done on the CentOS and RHEL guests on VMWare ESX hosts.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Kwan Lowe
On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams
awill...@whitemice.org wrote:
 On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote:
 Hi,
 So the 'fix' is applied directly to the host os,

 no, to the *guest* OS instances.  [please, do not top-post].

 is this the correct thing to do?
 sysctl -w vm.min_free_kbytes = 8192

 No space(s) I believe.

 sysctl -w vm.min_free_kbytes=8192

 I'm still not entirely clear as to why this setting should/will make a
 difference in maintaining filesystem integrity.

It's certainly possible that the error I was receiving was a different
reason, though similar symptoms. We started seeing filesystems go
read-only, and only rebooting would clear it up.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Johnny Hughes
On 02/14/2011 07:31 AM, Kwan Lowe wrote:
 On Mon, Feb 14, 2011 at 8:00 AM, Adam Tauno Williams
 awill...@whitemice.org wrote:
 On Mon, 2011-02-14 at 12:08 +, Keith Beeby wrote:
 Hi,
 So the 'fix' is applied directly to the host os,

 no, to the *guest* OS instances.  [please, do not top-post].

 is this the correct thing to do?
 sysctl -w vm.min_free_kbytes = 8192

 No space(s) I believe.

 sysctl -w vm.min_free_kbytes=8192

 I'm still not entirely clear as to why this setting should/will make a
 difference in maintaining filesystem integrity.
 
 It's certainly possible that the error I was receiving was a different
 reason, though similar symptoms. We started seeing filesystems go
 read-only, and only rebooting would clear it up.

I use that setting on the Host OS for VMWare to prevent a whole vm
from getting killed.

That setting will maintain a minimum amount of free memory available to
prevent a large program that requests memory quick from depleting all
available memory and causing the program killer from killing the highest
RAM process.

If you are on a Host OS box, the biggest Memory processes are your VMs,
and getting one killed off because memory reaches zero is not good.

I don't have any idea how it would fix journal errors on a drive, but I
guess it could.

I set it much higher than 8192 on the host machines ... I set it to 131072.



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Kwan Lowe
 It's certainly possible that the error I was receiving was a different
 reason, though similar symptoms. We started seeing filesystems go
 read-only, and only rebooting would clear it up.

 I use that setting on the Host OS for VMWare to prevent a whole vm
 from getting killed.

 That setting will maintain a minimum amount of free memory available to
 prevent a large program that requests memory quick from depleting all
 available memory and causing the program killer from killing the highest
 RAM process.

 If you are on a Host OS box, the biggest Memory processes are your VMs,
 and getting one killed off because memory reaches zero is not good.

 I don't have any idea how it would fix journal errors on a drive, but I
 guess it could.


It's been a few years since I put in the tuning, but here's some info
that might be useful:

http://communities.vmware.com/thread/20690?start=0tstart=0

In particular, others had reported seeing this error:

kernel: journal_get_undo_access: No memory for committed data.

I don't recall that error in my case, but might explain why the tuning
fixed the problem. There's a bugzilla for this:

https://bugzilla.redhat.com/show_bug.cgi?id=179605
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Bazooka Joe
On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams
awill...@whitemice.org wrote:
 On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote:
 Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
 storage, according the the VMware knowledge base article this should
 have been resolved in v5.1 update??.
 Does changing the vm.min_free_kbytes valu  apply CentOS v.5.4 and 5.5
 as well to resolve the issue?

 I guess we'll see [this issue has become extremely frustrating].

 I suppose it is 'good' to see that someone else sees the issue as well.
 One issue with virtualization is that debugging these types of issues is
 an order-of-magnitude more difficult [virtualized OS, virtualized
 storage, virtualization platform, or some interaction of all the
 above... ugh].



I am experiencing the same issue.

cent: current
exsi v3.5 update 5
storage nfs

I am in the process of rebuilding the virtual server using a different
os thinking it was just file system errors.

-bazooka
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-14 Thread Adam Tauno Williams
On Mon, 2011-02-14 at 13:01 -0800, Bazooka Joe wrote: 
 On Sun, Feb 13, 2011 at 4:03 PM, Adam Tauno Williams
 awill...@whitemice.org wrote:
  On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote:
  Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
  storage, according the the VMware knowledge base article this should
  have been resolved in v5.1 update??.
  Does changing the vm.min_free_kbytes valu  apply CentOS v.5.4 and 5.5
  as well to resolve the issue?
  I guess we'll see [this issue has become extremely frustrating].
  I suppose it is 'good' to see that someone else sees the issue as well.
  One issue with virtualization is that debugging these types of issues is
  an order-of-magnitude more difficult [virtualized OS, virtualized
  storage, virtualization platform, or some interaction of all the
  above... ugh].
 I am experiencing the same issue.
 cent: current
 exsi v3.5 update 5
 storage nfs
 I am in the process of rebuilding the virtual server using a different
 os thinking it was just file system errors.

What other OS?

I've experienced one [possibly unrelated] corruption of the /tmp
filesystem on an openSUSE 11.1 VM.  So far Windows VMs seem immune to
the issue.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Adam Tauno Williams
I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
using iSCSI storage.  Recently we've begun to experience journal aborts
resulting in remounted-read-only filesystems as well as other filesystem
issues - I can unmount a filesystem and force a check with fsck -f and
occasionally find errors.

I've found -
https://bugzilla.redhat.com/show_bug.cgi?id=228108
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=51306
- which seem related but I believe I am running a kernel that contains
these fixes.

My kernel is 2.6.18-194.32.1.el5 on one of the most effected hosts.

Does anyone else have experience with similar issues or know of the
status of this Bug/KB?

I can install, boot, run, and then at some seemingly random moment -

init_special_inode: bogus i_mode (50632)
init_special_inode: bogus i_mode (137147)
init_special_inode: bogus i_mode (172036)
init_special_inode: bogus i_mode (175720)
init_special_inode: bogus i_mode (72350)
init_special_inode: bogus i_mode (174751)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698169 in dir
#19696695
Aborting journal on device sdb2.
init_special_inode: bogus i_mode (165661)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698131 in dir
#19696695
init_special_inode: bogus i_mode (76763)
init_special_inode: bogus i_mode (3116)
init_special_inode: bogus i_mode (75363)
init_special_inode: bogus i_mode (77034)
init_special_inode: bogus i_mode (132237)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698139 in dir
#19696695
init_special_inode: bogus i_mode (53031)
init_special_inode: bogus i_mode (33361)
init_special_inode: bogus i_mode (77546)
init_special_inode: bogus i_mode (6516)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698143 in dir
#19696695
init_special_inode: bogus i_mode (6442)
init_special_inode: bogus i_mode (72554)
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698142 in dir
#19696695
EXT3-fs error (device sdb2): ext3_lookup: unlinked inode 19698164 in dir
#19696695
init_special_inode: bogus i_mode (73171)
init_special_inode: bogus i_mode (154432)
init_special_inode: bogus i_mode (34302)
init_special_inode: bogus i_mode (131733)
init_special_inode: bogus i_mode (30773)
ext3_abort called.
EXT3-fs error (device sdb2): ext3_journal_start_sb: Detected aborted
journal
Remounting filesystem read-only

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Kwan Lowe
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
awill...@whitemice.org wrote:
 I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
 using iSCSI storage.  Recently we've begun to experience journal aborts
 resulting in remounted-read-only filesystems as well as other filesystem
 issues - I can unmount a filesystem and force a check with fsck -f and
 occasionally find errors.

 I've found -
 https://bugzilla.redhat.com/show_bug.cgi?id=228108
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=51306
 - which seem related but I believe I am running a kernel that contains
 these fixes.

I ran into a similar problem, but it was not specifically iSCSI.  We
ended up setting a sysctl.conf file.  Give me a few and I will find
the setting..
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Kwan Lowe
On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
awill...@whitemice.org wrote:
 I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
 using iSCSI storage.  Recently we've begun to experience journal aborts
 resulting in remounted-read-only filesystems as well as other filesystem
 issues - I can unmount a filesystem and force a check with fsck -f and
 occasionally find errors.

http://communities.vmware.com/message/245983

The setting we used to resolve was vm.min_free_kbytes = 8192

Previous to this we were seeing the error pop up every week or so.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Keith Beeby
Hi

Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared storage, 
according the the VMware knowledge base article this should have been resolved 
in v5.1 update??.

Does changing the vm.min_free_kbytes value apply CentOS v.5.4 and 5.5 as well 
to resolve the issue?

On 13 Feb 2011, at 14:40, Kwan Lowe kwan.l...@gmail.com wrote:

 On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
 awill...@whitemice.org wrote:
 I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
 using iSCSI storage.  Recently we've begun to experience journal aborts
 resulting in remounted-read-only filesystems as well as other filesystem
 issues - I can unmount a filesystem and force a check with fsck -f and
 occasionally find errors.
 
 http://communities.vmware.com/message/245983
 
 The setting we used to resolve was vm.min_free_kbytes = 8192
 
 Previous to this we were seeing the error pop up every week or so.
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Adam Tauno Williams
On Sun, 2011-02-13 at 09:40 -0500, Kwan Lowe wrote: 
 On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
 awill...@whitemice.org wrote:
  I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
  using iSCSI storage.  Recently we've begun to experience journal aborts
  resulting in remounted-read-only filesystems as well as other filesystem
  issues - I can unmount a filesystem and force a check with fsck -f and
  occasionally find errors.
 http://communities.vmware.com/message/245983
 The setting we used to resolve was vm.min_free_kbytes = 8192
 Previous to this we were seeing the error pop up every week or so.

You made this change to the *virtual machine* [not the host OS]?   

This thread indicates this was with VMware Workstation and not ESX
(correct)?

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Journal Aborts in VMware ESX (Filesystem Corruption)

2011-02-13 Thread Adam Tauno Williams
On Sun, 2011-02-13 at 20:28 +, Keith Beeby wrote:
 Also seeing this issue with CentOS 5.4 and 5.5 with NFS shared
 storage, according the the VMware knowledge base article this should
 have been resolved in v5.1 update??.
 Does changing the vm.min_free_kbytes valu  apply CentOS v.5.4 and 5.5
 as well to resolve the issue?

I guess we'll see [this issue has become extremely frustrating].

I suppose it is 'good' to see that someone else sees the issue as well.
One issue with virtualization is that debugging these types of issues is
an order-of-magnitude more difficult [virtualized OS, virtualized
storage, virtualization platform, or some interaction of all the
above... ugh].

 On 13 Feb 2011, at 14:40, Kwan Lowe kwan.l...@gmail.com wrote:
  On Sun, Feb 13, 2011 at 9:09 AM, Adam Tauno Williams
  awill...@whitemice.org wrote:
  I have several CentOS5 hosts in a VMware ESX 3.5.0 226117 environment
  using iSCSI storage.  Recently we've begun to experience journal aborts
  resulting in remounted-read-only filesystems as well as other filesystem
  issues - I can unmount a filesystem and force a check with fsck -f and
  occasionally find errors.
  http://communities.vmware.com/message/245983 
  The setting we used to resolve was vm.min_free_kbytes = 8192
  Previous to this we were seeing the error pop up every week or so.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos