[CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
Hello all, I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each. Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 9:06 AM, Boris Epstein borepst...@gmail.com wrote: Hello all, I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each. Now the

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the following problem: once in awhile that XFS partition starts generating multiple I/O errors, files that had content become 0 byte, directories

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
Correction to the above: the XFS partition is 26TB, not 16 TB (not that it should matter in the context of this particular situation). Yes, it does matter: Read this: *[CentOS] 32-bit kernel+XFS+16.xTB filesystem = potential disaster*

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 2:27 PM, Miguel Medalha miguelmeda...@sapo.ptwrote: Correction to the above: the XFS partition is 26TB, not 16 TB (not that it should matter in the context of this particular situation). Yes, it does matter: Read this: *[CentOS] 32-bit kernel+XFS+16.xTB filesystem

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
uname -a Linux nrims-bs 2.6.18-274.12.1.el5xen #1 SMP Tue Nov 29 14:18:21 EST 2011 x86_64 x86_64 x86_64 GNU/Linux this is clearly a 64-bit OS so the 32-bit limitations ought not to apply. Ok! Since you didn't inform us in your initial post, I thought I should ask you in order to

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
Nevertheless, it seems to me that you should have more than 3GB of RAM on a 64 bit system... Since the width of the binary word is 64 bit in this case, 3GB correspond to 1.5GB on a 32 bit system... If you have a 64 bit system you should give it space to work properly.

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
Nevertheless, it seems to me that you should have more than 3GB of RAM on a 64 bit system... Since the width of the binary word is 64 bit in this case, 3GB correspond to 1.5GB on a 32 bit system... If you have a 64 bit system you should give it space to work properly. ... and the fact that

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 2:35 PM, Miguel Medalha miguelmeda...@sapo.ptwrote: Nevertheless, it seems to me that you should have more than 3GB of RAM on a 64 bit system... Since the width of the binary word is 64 bit in this case, 3GB correspond to 1.5GB on a 32 bit system... If you have a 64

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 2:37 PM, Miguel Medalha miguelmeda...@sapo.ptwrote: Nevertheless, it seems to me that you should have more than 3GB of RAM on a 64 bit system... Since the width of the binary word is 64 bit in this case, 3GB correspond to 1.5GB on a 32 bit system... If you have a 64

Re: [CentOS] weird XFS problem

2012-01-22 Thread Miguel Medalha
You are right - it would indeed be desirable to have more than 3 GB of RAM on that system. However it is not obvious to me that having that little RAM should cause I/O failure? Why? That it would make the machine slow is to be expected - and especially so given that I had to jack the

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 2:43 PM, Miguel Medalha miguelmeda...@sapo.ptwrote: You are right - it would indeed be desirable to have more than 3 GB of RAM on that system. However it is not obvious to me that having that little RAM should cause I/O failure? Why? That it would make the machine

Re: [CentOS] weird XFS problem

2012-01-22 Thread Joseph L. Casale
I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each. Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So perhaps

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 2:56 PM, Joseph L. Casale jcas...@activenetwerx.com wrote: I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each. Now the machine

Re: [CentOS] weird XFS problem

2012-01-22 Thread Ross Walker
On Jan 22, 2012, at 10:00 AM, Boris Epstein borepst...@gmail.com wrote: Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0026): Drive ECC error reported:port=4, unit=0. Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x002D): Source drive error

Re: [CentOS] weird XFS problem

2012-01-22 Thread Ross Walker
On Jan 22, 2012, at 4:41 PM, Ross Walker rswwal...@gmail.com wrote: On Jan 22, 2012, at 10:00 AM, Boris Epstein borepst...@gmail.com wrote: Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0026): Drive ECC error reported:port=4, unit=0. Jan 22 09:17:53 nrims-bs kernel:

Re: [CentOS] weird XFS problem

2012-01-22 Thread Keith Keller
On 2012-01-22, Boris Epstein borepst...@gmail.com wrote: Also, here's somethine else I have discovered. Apparently there is an potential intermittent RAID disk trouble. At least I found the following in the system log: Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR

Re: [CentOS] weird XFS problem

2012-01-22 Thread Boris Epstein
On Sun, Jan 22, 2012 at 1:34 PM, Keith Keller kkel...@wombat.san-francisco.ca.us wrote: On 2012-01-22, Boris Epstein borepst...@gmail.com wrote: Also, here's somethine else I have discovered. Apparently there is an potential intermittent RAID disk trouble. At least I found the following

Re: [CentOS] weird XFS problem

2012-01-22 Thread Keith Keller
On 2012-01-22, Boris Epstein borepst...@gmail.com wrote: The RAID is on the controller level. Yes, I believe the controller is a 3Ware 9xxx series - I don't recall the details right now. The details are important in this context--the 9550 is the problematic one (at least for me, though I've