Re: [CentOS] 40TB File System Recommendations

2011-04-18 Thread Ross Walker
On Apr 17, 2011, at 3:05 AM, Charles Polisher cpol...@surewest.net wrote:

 On Wed, Apr 13, 2011 at 11:55:08PM -0400, Ross Walker wrote:
 On Apr 13, 2011, at 9:40 PM, Brandon Ooi brand...@gmail.com wrote:
 
 On Wed, Apr 13, 2011 at 6:04 PM, Ross Walker rswwal...@gmail.com wrote:
 
 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.
 
 Second was multiple hardware arrays over linux md raid0, also over fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.
 
 Every now and then I hear these XFS horror stories. They seem too
 impossible to believe.
 
 Nothing breaks for absolutely no reason and failure to know where
 the breakage was shows that maybe there wasn't adequately skilled
 techinicians for the technology deployed.
 
 XFS if run in a properly configured environment will run flawlessly.
 
 Here's some deconstruction of your argument:
 
... and failure to know where the breakage was shows that maybe there
 wasn't adequately skilled techinicians for the technology deployed
 
 This is blaming the victim. One must have the time, skills and
 often other resources to do root cause analysis.
 
XFS if run in a properly configured environment will run flawlessly. 
 
 I think a more narrowly qualified opinion is appropriate: XFS,
 properly configured, running on perfect hardware atop a perfect
 kernel, will have fewer serious bugs than it had on Jan 1, 2009.
 Here's a summary of XFS bugzilla data from 2009 through today:

I already apologized for those comments last week. No need to keep flogging a 
dead horse here.


 Bug Status
Severity
  NEW ASSIGNEDREOPENEDTotal
blocker 3.   .3
critical   102   .   12
major  482   .   50
normal118   46   3  167
minor  263   .   29
trivial 7.   .7
enhancement399   1   49
Total 251   62   4  317
 
 See also the XFS mailing list for a big dose of reality. Flawlessly
 is not the label I would use for XFS. /Maybe/ for Ext2.

Basically it comes down to that all file systems, as do all software, have bugs 
and edge cases and thinking that one can find a file system that is bug free is 
naive.

Test, test, test.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-17 Thread Charles Polisher
On Wed, Apr 13, 2011 at 11:55:08PM -0400, Ross Walker wrote:
 On Apr 13, 2011, at 9:40 PM, Brandon Ooi brand...@gmail.com wrote:
 
  On Wed, Apr 13, 2011 at 6:04 PM, Ross Walker rswwal...@gmail.com wrote:
  
   One was a hardware raid over fibre channel, which silently corrupted
   itself. System checked out fine, raid array checked out fine, xfs was
   replaced with ext3, and the system ran without issue.
  
   Second was multiple hardware arrays over linux md raid0, also over fibre
   channel. This was not so silent corruption, as in xfs would detect it
   and lock the filesystem into read-only before it, pardon the pun, truly
   fscked itself. Happened two or three times, before we gave up, split up
   the raid, and went ext3, Again, no issues.
  
  Every now and then I hear these XFS horror stories. They seem too
  impossible to believe.
  
  Nothing breaks for absolutely no reason and failure to know where
  the breakage was shows that maybe there wasn't adequately skilled
  techinicians for the technology deployed.
  
  XFS if run in a properly configured environment will run flawlessly.

Here's some deconstruction of your argument:

... and failure to know where the breakage was shows that maybe there
 wasn't adequately skilled techinicians for the technology deployed

This is blaming the victim. One must have the time, skills and
often other resources to do root cause analysis.

XFS if run in a properly configured environment will run flawlessly. 

I think a more narrowly qualified opinion is appropriate: XFS,
properly configured, running on perfect hardware atop a perfect
kernel, will have fewer serious bugs than it had on Jan 1, 2009.
Here's a summary of XFS bugzilla data from 2009 through today:

 Bug Status
Severity
  NEW ASSIGNEDREOPENEDTotal
blocker 3.   .3
critical   102   .   12
major  482   .   50
normal118   46   3  167
minor  263   .   29
trivial 7.   .7
enhancement399   1   49
Total 251   62   4  317

See also the XFS mailing list for a big dose of reality. Flawlessly
is not the label I would use for XFS. /Maybe/ for Ext2.
-- 
Charles Polisher


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Christopher Chan
On Thursday, April 14, 2011 11:26 PM, Benjamin Franz wrote:
 On 04/14/2011 08:04 AM, Christopher Chan wrote:

 Then try both for your use case and your hardware. We have wide raid6 setups
 that does well over 500 MB/s write (that is: not all raid6 writes suck...).

 /me replaces all of Peter's cache with 64MB modules.

 Let's try again.

 If you are trying to imply that RAID6 can't go fast when write size is
 larger than the cache, you are simply wrong. Even with just a 8 x RAID6,
 I've tested a system as sustained sequential (not burst) 156Mbytes/s out
 and 387 Mbytes/s in using 7200 rpm 1.5 TB drives. Bonnie++ results
 attached. Bonnie++ by default uses twice as much data as your available
 RAM to make sure you aren't just seeing cache. IOW: That machine only
 had 4GB of RAM and 256 MB of controller cache during the test but wrote
 and read 8 GB of data for the tests.

Wanna try that again with 64MB of cache only and tell us whether there 
is a difference in performance?

There is a reason why 3ware 85xx cards were complete rubbish when used 
for raid5 and which led to the 95xx/96xx series.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Christopher Chan
On Thursday, April 14, 2011 11:30 PM, Les Mikesell wrote:
 On 4/14/2011 7:32 AM, Christopher Chan wrote:

 HAHAHAAAAHA

 The XFS codebase is the biggest pile of mess in the Linux kernel and you
 expect it to be not run into mysterious problems? Remember, XFS was
 PORTED over to Linux. It is not a 'native' thing to Linux.

 Well yeah, but the way I remember it, SGI was using it for real work
 like video editing and storing zillions of files back when Linux was a
 toy with a 2 gig file size limit and linear directory scans as the only
 option.   If you mean that the Linux side had a not-invented-here
 attitude about it and did the port badly you might be right...


No, the XFS guys had to work around the differences between the Linux vm 
and IRIX's and that eventually led to what we have today - a big messy 
pile of code. It would be no surprise for there to be stuff that get 
triggered imho.

I am not saying that XFS itself is bad. Just that the implementation on 
Linux was not quite the same quality as it is on IRIX.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Benjamin Franz
On 04/14/2011 09:00 PM, Christopher Chan wrote:

 Wanna try that again with 64MB of cache only and tell us whether there
 is a difference in performance?

 There is a reason why 3ware 85xx cards were complete rubbish when used
 for raid5 and which led to the 95xx/96xx series.
 _

I don't happen to have any systems I can test with the 1.5TB drives 
without controller cache right now, but I have a system with some old 
500GB drives  (which are about half as fast as the 1.5TB drives in 
individual sustained I/O throughput) attached directly to onboard SATA 
ports in a 8 x RAID6 with *no* controller cache at all. The machine has 
16GB of RAM and bonnie++ therefore used 32GB of data for the test.

Version  1.96   --Sequential Output-- --Sequential Input- 
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26 
590.5  11
Latency 24190us1244ms1580ms   60411us   69901us   
42586us
Version  1.96   --Sequential Create-- Random 
Create
pbox3   -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
  16 10910  31 + +++ + +++ 29293  80 + +++ 
+ +++
Latency   775us 610us 979us 740us 370us 
380us

Given that the underlaying drives are effectively something like half as 
fast as the drives in the other test, the results are quite comparable.

Cache doesn't make a lot of difference when you quickly write a lot more 
data than the cache can hold. The limiting factor becomes the slowest 
component - usually the drives themselves. Cache isn't magic performance 
pixie dust. It helps in certain use cases and is nearly irrelevant in 
others.

-- 
Benjamin Franz
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Peter Kjellström
On Thursday, April 14, 2011 05:26:41 PM Ross Walker wrote:
 2011/4/14 Peter Kjellström c...@nsc.liu.se:
...
  While I do concede the obvious point regarding rebuild time (raid6 takes
  from long to very long to rebuild) I'd like to point out:
  
   * If you do the math for a 12 drive raid10 vs raid6 then (using actual
  data from ~500 1T drives on HP cciss controllers during two years)
  raid10 is ~3x more likely to cause hard data loss than raid6.
  
   * mtbf is not everything there's also the thing called unrecoverable
  read errors. If you hit one while rebuilding your raid10 you're toast
  while in the raid6 case you'll use your 2nd parity and continue the
  rebuild.
 
 You mean if the other side of the mirror fails while rebuilding it.

No, the drive (unrecoverably) failing to read a sector is not the same thing 
as a drive failure. Drive failure frequency expressed in mtbf is around 1M 
hours (even though including predictive fail we see more like 250K hours). 
Unrecoverable read error rate (per sector) was quite recently on the order of 
1x to 10x of the drive size (a drive I looked up now was spec'ed alot higher 
at ~1000x drive size). If we assume a raid10 rebuild time of 12h and an 
unrecoverable read error once every 10x of drive size then the effective mean 
time between read error is 120h (two to ten thousand times worse than the 
drive mtbf). Admittedly these numbers are hard to get and equally hard to 
trust (or double check).

What it all comes down to is that raid10 (assuming just double- not tripple 
copy) stores your data with one extra copy/parity and in a single drive 
failure scenario you have zero extra data left (on that part of the array). 
That is, you depend on each and every bit of that (meaning the degraded part) 
data being correctly read. This means you very much want both:

 1) Very fast rebuilds (= you need hot-spare)
 2) An unrecoverable read error rate much larger than your drive size

or as you suggest below:

 3) Tripple copy

 Yes this is true, of course if this happens with RAID6 it will rebuild
 from parity IF there is a second hotspare available,

This is wrong, hot-spares are not that necessary when using raid6. This has to 
do with the fact that rebuild times (time from you start being vulnerable to 
whatever rebuild completes) are already long. An added 12h for a tech to swap 
in the spare only marginally increases your risks.

 cause remember
 the first failure wasn't cleared before the second failure occurred.
 Now your RAID6 is in severe degraded state, one more failure before
 either of these disks is rebuilt will mean toast for the array.

All of this was taken into account in my original example above. In the end 
(with my data) raid10 was around 3x more likely to cause ultimate data loss 
than raid6.

 Now
 the performance of the array is practically unusable and the load on
 the disks is high as it does a full recalculation rebuild, and if they
 are large it will be high for a very long time, now if any other disk
 in the very large RAID6 array is near failure, or has a bad sector,
 this taxing load could very well push it over the edge

In my example a 12 drive raid6 rebuild takes 6-7 days this works out to  5 
MB/s seq read per drive. This added load is not very noticeable in our 
environment (taking into account normal patrol reads and user data traffic).

Either way, the general problem of [rebuild stress] pushing drives over the 
edge is a larger threat to raid10 than raid6 (it being fatal in the first 
case...).

 and the risk of
 such an event occurring increases with the size of the array and the
 size of the disk surface.
 
 I think this is where the mdraid raid10 shines because it can have 3
 copies (or more) of the data instead of just two,

I think we've now moved into what most people would call unreasonable. Let's 
see what we have for a 12 drive box (quite common 2U size):

 raid6: 12x on raid6 no hot spare (see argument above) = 10 data drives
 raid10: 11x tripple store on raid10 one spare = 3.66 data drives

or (if your raid's not odd-drive capable):

 raid10: 9x tripple store on raid10 one to three spares = 3 data drives

(ok, yes you could get 4 data drives out of it if you skipped hot-spare)

That is almost a 2.7x-3.3x diff! My users sure care if their X $ results in 
1/3 the space (or cost = 3x for the same space if you prefer).

On top of this most raid implementations for raid10 lacks tripple copy 
functionality.

Also note that raid10 that allows for odd number of drives is more vulnerable 
to 2nd drive failures resulting in an even larger than 3x improvement using 
raid6 (vs double copy odd drive handling raid10).

/Peter

 of course a three
 times (or more) the cost. It also allows for uneven number of disks as
 it just saves copies on different spindles rather then mirrors. This
 I think provides the best protection against failure and the best
 performance, but at the worst cost, but with 2TB and 4TB disks coming
 out
...



Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Christopher Chan
On Friday, April 15, 2011 07:24 PM, Benjamin Franz wrote:
 On 04/14/2011 09:00 PM, Christopher Chan wrote:

 Wanna try that again with 64MB of cache only and tell us whether there
 is a difference in performance?

 There is a reason why 3ware 85xx cards were complete rubbish when used
 for raid5 and which led to the 95xx/96xx series.
 _

 I don't happen to have any systems I can test with the 1.5TB drives
 without controller cache right now, but I have a system with some old
 500GB drives  (which are about half as fast as the 1.5TB drives in
 individual sustained I/O throughput) attached directly to onboard SATA
 ports in a 8 x RAID6 with *no* controller cache at all. The machine has
 16GB of RAM and bonnie++ therefore used 32GB of data for the test.

 Version  1.96   --Sequential Output-- --Sequential Input-
 --Random-
 Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
 /sec %CP
 pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26
 590.5  11
 Latency 24190us1244ms1580ms   60411us   69901us
 42586us
 Version  1.96   --Sequential Create-- Random
 Create
 pbox3   -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 /sec %CP
16 10910  31 + +++ + +++ 29293  80 + +++
 + +++
 Latency   775us 610us 979us 740us 370us
 380us

 Given that the underlaying drives are effectively something like half as
 fast as the drives in the other test, the results are quite comparable.

Woohoo, next we will be seeing md raid6 also giving comparable results 
if that is the case. I am not the only person on this list that thinks 
cache is king for raid5/6 on hardware raid boards and the using hardware 
raid + bbu cache for better performance one of the two reasons why we 
don't do md raid5/6.



 Cache doesn't make a lot of difference when you quickly write a lot more
 data than the cache can hold. The limiting factor becomes the slowest
 component - usually the drives themselves. Cache isn't magic performance
 pixie dust. It helps in certain use cases and is nearly irrelevant in
 others.


Yeah, you are right - but cache is primarily to buffer the writes for 
performance. Why else go through the expense of getting bbu cache? So 
what happens when you tweak bonnie a bit?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Rudi Ahlers
On Fri, Apr 15, 2011 at 3:05 PM, Christopher Chan 
christopher.c...@bradbury.edu.hk wrote:

 On Friday, April 15, 2011 07:24 PM, Benjamin Franz wrote:
  On 04/14/2011 09:00 PM, Christopher Chan wrote:
 
  Wanna try that again with 64MB of cache only and tell us whether there
  is a difference in performance?
 
  There is a reason why 3ware 85xx cards were complete rubbish when used
  for raid5 and which led to the 95xx/96xx series.
  _
 
  I don't happen to have any systems I can test with the 1.5TB drives
  without controller cache right now, but I have a system with some old
  500GB drives  (which are about half as fast as the 1.5TB drives in
  individual sustained I/O throughput) attached directly to onboard SATA
  ports in a 8 x RAID6 with *no* controller cache at all. The machine has
  16GB of RAM and bonnie++ therefore used 32GB of data for the test.
 
  Version  1.96   --Sequential Output-- --Sequential Input-
  --Random-
  Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
  /sec %CP
  pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26
  590.5  11
  Latency 24190us1244ms1580ms   60411us   69901us
  42586us
  Version  1.96   --Sequential Create-- Random
  Create
  pbox3   -Create-- --Read--- -Delete-- -Create-- --Read---
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
 16 10910  31 + +++ + +++ 29293  80 + +++
  + +++
  Latency   775us 610us 979us 740us 370us
  380us
 
  Given that the underlaying drives are effectively something like half as
  fast as the drives in the other test, the results are quite comparable.

 Woohoo, next we will be seeing md raid6 also giving comparable results
 if that is the case. I am not the only person on this list that thinks
 cache is king for raid5/6 on hardware raid boards and the using hardware
 raid + bbu cache for better performance one of the two reasons why we
 don't do md raid5/6.


 
  Cache doesn't make a lot of difference when you quickly write a lot more
  data than the cache can hold. The limiting factor becomes the slowest
  component - usually the drives themselves. Cache isn't magic performance
  pixie dust. It helps in certain use cases and is nearly irrelevant in
  others.
 

 Yeah, you are right - but cache is primarily to buffer the writes for
 performance. Why else go through the expense of getting bbu cache? So
 what happens when you tweak bonnie a bit?
 ___



As matter of interest, does anyone know how to use an SSD drive for cach
purposes on Linux software RAID  drives? ZFS has this feature and it makes a
helluva difference to a storage server's performance.



-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Jerry Franz
On 04/15/2011 06:05 AM, Christopher Chan wrote:

 Woohoo, next we will be seeing md raid6 also giving comparable results
 if that is the case. I am not the only person on this list that thinks
 cache is king for raid5/6 on hardware raid boards and the using hardware
 raid + bbu cache for better performance one of the two reasons why we
 don't do md raid5/6.



That *is* md RAID6. Sorry I didn't make that clear. I don't use anyone's 
hardware RAID6 right now because I haven't found a board so far that was 
as fast as using md. I get better performance from even a BBU backed 95X 
series 3ware board by using it to serve the drives as JBOD and then 
using md to do the actual raid.

 Yeah, you are right - but cache is primarily to buffer the writes for
 performance. Why else go through the expense of getting bbu cache? So
 what happens when you tweak bonnie a bit?

For smaller writes. When writes *do* fit in the cache you get a big 
bump. As I said: Helps some cases, not all cases. BBU backed cache helps 
if you have lots of small writes. Not so much if you are writing 
gigabytes of stuff more sequentially.

-- 
Benjamin Franz
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Ross Walker
On Apr 15, 2011, at 9:17 AM, Rudi Ahlers r...@softdux.com wrote:

 
 
 On Fri, Apr 15, 2011 at 3:05 PM, Christopher Chan 
 christopher.c...@bradbury.edu.hk wrote:
 On Friday, April 15, 2011 07:24 PM, Benjamin Franz wrote:
  On 04/14/2011 09:00 PM, Christopher Chan wrote:
 
  Wanna try that again with 64MB of cache only and tell us whether there
  is a difference in performance?
 
  There is a reason why 3ware 85xx cards were complete rubbish when used
  for raid5 and which led to the 95xx/96xx series.
  _
 
  I don't happen to have any systems I can test with the 1.5TB drives
  without controller cache right now, but I have a system with some old
  500GB drives  (which are about half as fast as the 1.5TB drives in
  individual sustained I/O throughput) attached directly to onboard SATA
  ports in a 8 x RAID6 with *no* controller cache at all. The machine has
  16GB of RAM and bonnie++ therefore used 32GB of data for the test.
 
  Version  1.96   --Sequential Output-- --Sequential Input-
  --Random-
  Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
  /sec %CP
  pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26
  590.5  11
  Latency 24190us1244ms1580ms   60411us   69901us
  42586us
  Version  1.96   --Sequential Create-- Random
  Create
  pbox3   -Create-- --Read--- -Delete-- -Create-- --Read---
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
 16 10910  31 + +++ + +++ 29293  80 + +++
  + +++
  Latency   775us 610us 979us 740us 370us
  380us
 
  Given that the underlaying drives are effectively something like half as
  fast as the drives in the other test, the results are quite comparable.
 
 Woohoo, next we will be seeing md raid6 also giving comparable results
 if that is the case. I am not the only person on this list that thinks
 cache is king for raid5/6 on hardware raid boards and the using hardware
 raid + bbu cache for better performance one of the two reasons why we
 don't do md raid5/6.
 
 
 
  Cache doesn't make a lot of difference when you quickly write a lot more
  data than the cache can hold. The limiting factor becomes the slowest
  component - usually the drives themselves. Cache isn't magic performance
  pixie dust. It helps in certain use cases and is nearly irrelevant in
  others.
 
 
 Yeah, you are right - but cache is primarily to buffer the writes for
 performance. Why else go through the expense of getting bbu cache? So
 what happens when you tweak bonnie a bit?
 ___
 
 
 
 As matter of interest, does anyone know how to use an SSD drive for cach 
 purposes on Linux software RAID  drives? ZFS has this feature and it makes a 
 helluva difference to a storage server's performance. 

Put the file system's log device on it.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Rudi Ahlers
On Fri, Apr 15, 2011 at 6:26 PM, Ross Walker rswwal...@gmail.com wrote:

 On Apr 15, 2011, at 9:17 AM, Rudi Ahlers r...@softdux.com wrote:



 On Fri, Apr 15, 2011 at 3:05 PM, Christopher Chan 
 christopher.c...@bradbury.edu.hk
 christopher.c...@bradbury.edu.hk wrote:

 On Friday, April 15, 2011 07:24 PM, Benjamin Franz wrote:
  On 04/14/2011 09:00 PM, Christopher Chan wrote:
 
  Wanna try that again with 64MB of cache only and tell us whether there
  is a difference in performance?
 
  There is a reason why 3ware 85xx cards were complete rubbish when used
  for raid5 and which led to the 95xx/96xx series.
  _
 
  I don't happen to have any systems I can test with the 1.5TB drives
  without controller cache right now, but I have a system with some old
  500GB drives  (which are about half as fast as the 1.5TB drives in
  individual sustained I/O throughput) attached directly to onboard SATA
  ports in a 8 x RAID6 with *no* controller cache at all. The machine has
  16GB of RAM and bonnie++ therefore used 32GB of data for the test.
 
  Version  1.96   --Sequential Output-- --Sequential Input-
  --Random-
  Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
  /sec %CP
  pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26
  590.5  11
  Latency 24190us1244ms1580ms   60411us   69901us
  42586us
  Version  1.96   --Sequential Create-- Random
  Create
  pbox3   -Create-- --Read--- -Delete-- -Create-- --Read---
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
 16 10910  31 + +++ + +++ 29293  80 + +++
  + +++
  Latency   775us 610us 979us 740us 370us
  380us
 
  Given that the underlaying drives are effectively something like half as
  fast as the drives in the other test, the results are quite comparable.

 Woohoo, next we will be seeing md raid6 also giving comparable results
 if that is the case. I am not the only person on this list that thinks
 cache is king for raid5/6 on hardware raid boards and the using hardware
 raid + bbu cache for better performance one of the two reasons why we
 don't do md raid5/6.


 
  Cache doesn't make a lot of difference when you quickly write a lot more
  data than the cache can hold. The limiting factor becomes the slowest
  component - usually the drives themselves. Cache isn't magic performance
  pixie dust. It helps in certain use cases and is nearly irrelevant in
  others.
 

 Yeah, you are right - but cache is primarily to buffer the writes for
 performance. Why else go through the expense of getting bbu cache? So
 what happens when you tweak bonnie a bit?
 ___



 As matter of interest, does anyone know how to use an SSD drive for cach
 purposes on Linux software RAID  drives? ZFS has this feature and it makes a
 helluva difference to a storage server's performance.


 Put the file system's log device on it.

 -Ross


 ___




Well, ZFS has a separate ZIL for that purpose, and the ZIL adds extra
protection / redundancy to the whole pool.

But the Cache / L2ARC drive caches all common reads  writes (simply put)
onto SSD to improve overall system performance.

So I was wondering if one could do this with mdraid or even just EXT3 /
EXT4?



-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Ross Walker
On Apr 15, 2011, at 12:32 PM, Rudi Ahlers r...@softdux.com wrote:

 
 
 On Fri, Apr 15, 2011 at 6:26 PM, Ross Walker rswwal...@gmail.com wrote:
 On Apr 15, 2011, at 9:17 AM, Rudi Ahlers r...@softdux.com wrote:
 
 
 
 On Fri, Apr 15, 2011 at 3:05 PM, Christopher Chan 
 christopher.c...@bradbury.edu.hk wrote:
 On Friday, April 15, 2011 07:24 PM, Benjamin Franz wrote:
  On 04/14/2011 09:00 PM, Christopher Chan wrote:
 
  Wanna try that again with 64MB of cache only and tell us whether there
  is a difference in performance?
 
  There is a reason why 3ware 85xx cards were complete rubbish when used
  for raid5 and which led to the 95xx/96xx series.
  _
 
  I don't happen to have any systems I can test with the 1.5TB drives
  without controller cache right now, but I have a system with some old
  500GB drives  (which are about half as fast as the 1.5TB drives in
  individual sustained I/O throughput) attached directly to onboard SATA
  ports in a 8 x RAID6 with *no* controller cache at all. The machine has
  16GB of RAM and bonnie++ therefore used 32GB of data for the test.
 
  Version  1.96   --Sequential Output-- --Sequential Input-
  --Random-
  Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
  /sec %CP
  pbox332160M   389  98 76709  22 91071  26  2209  95 264892  26
  590.5  11
  Latency 24190us1244ms1580ms   60411us   69901us
  42586us
  Version  1.96   --Sequential Create-- Random
  Create
  pbox3   -Create-- --Read--- -Delete-- -Create-- --Read---
  -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
  /sec %CP
 16 10910  31 + +++ + +++ 29293  80 + +++
  + +++
  Latency   775us 610us 979us 740us 370us
  380us
 
  Given that the underlaying drives are effectively something like half as
  fast as the drives in the other test, the results are quite comparable.
 
 Woohoo, next we will be seeing md raid6 also giving comparable results
 if that is the case. I am not the only person on this list that thinks
 cache is king for raid5/6 on hardware raid boards and the using hardware
 raid + bbu cache for better performance one of the two reasons why we
 don't do md raid5/6.
 
 
 
  Cache doesn't make a lot of difference when you quickly write a lot more
  data than the cache can hold. The limiting factor becomes the slowest
  component - usually the drives themselves. Cache isn't magic performance
  pixie dust. It helps in certain use cases and is nearly irrelevant in
  others.
 
 
 Yeah, you are right - but cache is primarily to buffer the writes for
 performance. Why else go through the expense of getting bbu cache? So
 what happens when you tweak bonnie a bit?
 ___
 
 
 
 As matter of interest, does anyone know how to use an SSD drive for cach 
 purposes on Linux software RAID  drives? ZFS has this feature and it makes a 
 helluva difference to a storage server's performance. 
 
 Put the file system's log device on it.
 
 -Ross
 
 
 ___
 
 
 
 Well, ZFS has a separate ZIL for that purpose, and the ZIL adds extra 
 protection / redundancy to the whole pool. 
 
 But the Cache / L2ARC drive caches all common reads  writes (simply put) 
 onto SSD to improve overall system performance. 
 
 So I was wondering if one could do this with mdraid or even just EXT3 / EXT4?

Ext3/4 and XFS allow specifying an external log device which if is an SSD can 
speed up writes. All these file systems aggressively use page cache for 
read/write cache. The only thing you don't get is L2ARC type cache, but I heard 
of a dm-cache project that might provide provide that type of cache.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-15 Thread Christopher Chan


 As matter of interest, does anyone know how to use an SSD drive for cach
 purposes on Linux software RAID  drives? ZFS has this feature and it
 makes a helluva difference to a storage server's performance.

You cannot. You can however use one for the external journal of ext3/4 
in full journaling mode for something similar.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Christopher Chan
Sent: Wednesday, April 13, 2011 4:49 PM
To: centos@centos.org
Subject: Re: [CentOS] 40TB File System Recommendations

 While we are at it, disks being directly connected to the raid card will
 mean there won't be bus contention from nics and what not whereas
 software raid 5/6 would have to deal with that.

 Could that really be an issue as well? What kind of traffic levels are we
 speaking of now? Approximately? That is to say, in as much this can be
 quantified at all.

 I've never really seen this problem.

Oh yeah, we are on PCIe and NUMA architectures now. I guess this point
no longer applies just like hardware raid being crap no longer applies
because they are not underpowered i960/tiny cache boards anymore.

I'm sorry, I can't quite read you. Is your reply meant to be sarcastic? If I
misunderstood it, I apologize.

Anyway, what I meant before was that I haven't really the problem with smaller
systems, like for eg department backups. Maybe up to 10TB-file systems, with not
too many user's homefolders, in the hundreds maybe, but still a lot of data
being transferred each day.
-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread John Jasen
On 04/13/2011 09:04 PM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasen jja...@realityfailure.org wrote:

snipped my stuff


 Every now and then I hear these XFS horror stories. They seem too impossible 
 to believe.
 
 Nothing breaks for absolutely no reason and failure to know where the 
 breakage was shows that maybe there wasn't adequately skilled techinicians 
 for the technology deployed.

Waving your hands and insulting the people who went through XFS failures
doesn't make me feel any better or make the problems not have occurred.

I would presume that we were lucky enough to have technicians on-site
skilled enough to track the problems down to XFS itself.

-- 
-- John E. Jasen (jja...@realityfailure.org)
-- Deserve Victory. -- Terry Goodkind, Naked Empire
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Wednesday, April 13, 2011 04:54:01 AM Ross Walker wrote:
 On Apr 12, 2011, at 8:53 AM, Rudi Ahlers r...@softdux.com wrote:
...
  As matter of interest, what hardware do you use? i.e. what CPU's, size
  of RAM and RAID cards do you use on this size system?
  
  Everyone always recommends to use smaller RAID arrays than one big fat
  one. So, I'm interested to know what you use, and how effective it
  works. i.e. if that 30TB was actively used by many hosts how does it
  cope? Or is it just archival storage?
 
 I would never create a RAID5/6 greater then 8 disks. Usually I create a 6
 or 7 disk RAID5 which means I can fit 2 in a 15 disk enclosure and have a
 hot spare and stripe them.

Personal preference here, personal preference there. Here's a datapoint: We 
run PB of data on 12 drive raid6 using sata (no hot spare). Are we happy with 
that config: yes, would it be faster to use 15K sas in raid10: yes *shrug*

/Peter
 
 The more RAID5 sets you have the greater the write IOPS you can achieve.
 
 Though for max IOPS nothing beats RAID10.
 
 -Ross


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 09:04 AM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasenjja...@realityfailure.org  wrote:

 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr   wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



 What were those circumstances? Crash? Power outage? What are the
 components of the RAID systems?

 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.

 Second was multiple hardware arrays over linux md raid0, also over fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

 Every now and then I hear these XFS horror stories. They seem too impossible 
 to believe.

 Nothing breaks for absolutely no reason and failure to know where the 
 breakage was shows that maybe there wasn't adequately skilled techinicians 
 for the technology deployed.

 XFS if run in a properly configured environment will run flawlessly.


HAHAHAAAAHA

The XFS codebase is the biggest pile of mess in the Linux kernel and you 
expect it to be not run into mysterious problems? Remember, XFS was 
PORTED over to Linux. It is not a 'native' thing to Linux.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 07:26 AM, John Jasen wrote:
 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr   wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



 What were those circumstances? Crash? Power outage? What are the
 components of the RAID systems?

 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.

 Second was multiple hardware arrays over linux md raid0, also over fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

32-bit kernel by any chance?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 02:54 PM, Sorin Srbu wrote:
 -Original Message-
 From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
 Of Christopher Chan
 Sent: Wednesday, April 13, 2011 4:49 PM
 To: centos@centos.org
 Subject: Re: [CentOS] 40TB File System Recommendations

 While we are at it, disks being directly connected to the raid card will
 mean there won't be bus contention from nics and what not whereas
 software raid 5/6 would have to deal with that.

 Could that really be an issue as well? What kind of traffic levels are we
 speaking of now? Approximately? That is to say, in as much this can be
 quantified at all.

 I've never really seen this problem.

 Oh yeah, we are on PCIe and NUMA architectures now. I guess this point
 no longer applies just like hardware raid being crap no longer applies
 because they are not underpowered i960/tiny cache boards anymore.

 I'm sorry, I can't quite read you. Is your reply meant to be sarcastic? If I
 misunderstood it, I apologize.

 Anyway, what I meant before was that I haven't really the problem with smaller
 systems, like for eg department backups. Maybe up to 10TB-file systems, with 
 not
 too many user's homefolders, in the hundreds maybe, but still a lot of data
 being transferred each day.

I know what you meant...and yes, the bus has plenty of bandwidth to not 
have to worry unless you are sticking it on a 1x lane slot.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Christopher Chan
Sent: Thursday, April 14, 2011 2:34 PM
To: centos@centos.org
Subject: Re: [CentOS] 40TB File System Recommendations

 Oh yeah, we are on PCIe and NUMA architectures now. I guess this point
 no longer applies just like hardware raid being crap no longer applies
 because they are not underpowered i960/tiny cache boards anymore.

 I'm sorry, I can't quite read you. Is your reply meant to be sarcastic? If
I
 misunderstood it, I apologize.

 Anyway, what I meant before was that I haven't really the problem with
smaller
 systems, like for eg department backups. Maybe up to 10TB-file systems, with
not
 too many user's homefolders, in the hundreds maybe, but still a lot of data
 being transferred each day.

I know what you meant...and yes, the bus has plenty of bandwidth to not
have to worry unless you are sticking it on a 1x lane slot.

Gotcha'. Thanks.

-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Simon Matter
 On Thursday, April 14, 2011 09:04 AM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasenjja...@realityfailure.org
 wrote:

 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr   wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



 What were those circumstances? Crash? Power outage? What are the
 components of the RAID systems?

 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.

 Second was multiple hardware arrays over linux md raid0, also over
 fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

 Every now and then I hear these XFS horror stories. They seem too
 impossible to believe.

 Nothing breaks for absolutely no reason and failure to know where the
 breakage was shows that maybe there wasn't adequately skilled
 techinicians for the technology deployed.

 XFS if run in a properly configured environment will run flawlessly.


 HAHAHAAAAHA

 The XFS codebase is the biggest pile of mess in the Linux kernel and you
 expect it to be not run into mysterious problems? Remember, XFS was
 PORTED over to Linux. It is not a 'native' thing to Linux.

You're confusing me, I always thought Linux has been ported to XFS :)

There were some issues with XFS and maybe there still are. But, you can
not say there are no environments where it work very stable. I've started
using XFS back in the RH7.2 days and I can also tell some stories, but not
all of them were XFS's fault. The only real problem was the fact that
RedHat didn't chose XFS as their FS of choice which meant that just a few
ressources were put into the XFS code and just a few peoples actually used
it. That's the only thing where ext2,3,4 was better IMHO.

Simon

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Ross Walker
On Apr 14, 2011, at 6:54 AM, John Jasen jja...@realityfailure.org wrote:

 On 04/13/2011 09:04 PM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasen jja...@realityfailure.org wrote:
 
 snipped my stuff
 
 
 Every now and then I hear these XFS horror stories. They seem too impossible 
 to believe.
 
 Nothing breaks for absolutely no reason and failure to know where the 
 breakage was shows that maybe there wasn't adequately skilled techinicians 
 for the technology deployed.
 
 Waving your hands and insulting the people who went through XFS failures
 doesn't make me feel any better or make the problems not have occurred.
W
You are correct it came across as rude and condescending, I apologize.

It was a knee jerk reaction that came from reading many such posts that XFS is 
no good because it caused X where X came about because people didn't know how 
to implement XFS safely or correctly.

Of course I'm not trying to make any legitimately bad experiences any less 
legitimate. We all have them, and over a long enough period of time, with most 
file systems.

 I would presume that we were lucky enough to have technicians on-site
 skilled enough to track the problems down to XFS itself.

Yes, it is always better to catch these through testing then in production.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:
  OTOH, gparted doesn't see my software raid array either. Gparted it
  rather practical for regular plain vanilla partitions, but for more
  advanced stuff and filesystems, fdisk is probably better.
 
  For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.

Even better, use LVM and stay away from partitioning completely.

/Peter


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Steve Brooks

On Thu, 14 Apr 2011, Peter Kjellström wrote:


On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:

OTOH, gparted doesn't see my software raid array either. Gparted it
rather practical for regular plain vanilla partitions, but for more
advanced stuff and filesystems, fdisk is probably better.


 For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.


Even better, use LVM and stay away from partitioning completely.


Is it not ok to build the filesystem straight onto the device and not 
bother with patitioning at all.


Steve
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Peter Kjellström
Sent: Thursday, April 14, 2011 3:31 PM
To: centos@centos.org
Subject: Re: [CentOS] 40TB File System Recommendations

On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:
  OTOH, gparted doesn't see my software raid array either. Gparted it
  rather practical for regular plain vanilla partitions, but for more
  advanced stuff and filesystems, fdisk is probably better.

  For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.

Even better, use LVM and stay away from partitioning completely.

Wasn't there some gotchas' regarding LVM's and ext4?? I might be imagining
things things...
-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 08:55 PM, Simon Matter wrote:
 On Thursday, April 14, 2011 09:04 AM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasenjja...@realityfailure.org
 wrote:

 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.frwrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



 What were those circumstances? Crash? Power outage? What are the
 components of the RAID systems?

 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.

 Second was multiple hardware arrays over linux md raid0, also over
 fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

 Every now and then I hear these XFS horror stories. They seem too
 impossible to believe.

 Nothing breaks for absolutely no reason and failure to know where the
 breakage was shows that maybe there wasn't adequately skilled
 techinicians for the technology deployed.

 XFS if run in a properly configured environment will run flawlessly.


 HAHAHAAAAHA

 The XFS codebase is the biggest pile of mess in the Linux kernel and you
 expect it to be not run into mysterious problems? Remember, XFS was
 PORTED over to Linux. It is not a 'native' thing to Linux.

 You're confusing me, I always thought Linux has been ported to XFS :)

 There were some issues with XFS and maybe there still are. But, you can
 not say there are no environments where it work very stable. I've started
 using XFS back in the RH7.2 days and I can also tell some stories, but not
 all of them were XFS's fault. The only real problem was the fact that
 RedHat didn't chose XFS as their FS of choice which meant that just a few
 ressources were put into the XFS code and just a few peoples actually used
 it. That's the only thing where ext2,3,4 was better IMHO.


Where did I say that there are no environments where it works very 
stable? I used XFS extensively when I was running mail server farms for 
the mail queue filesystem and I only remember one or two incidents when 
the filesystem was marked read-only for no reason (seemingly - never had 
the time to find out why) but a reboot fixed those. XFS was better 
performing then but less reliable (yoohoo, hi Linux fake 
fsync/fdatasync) than ext3. So I personally have not had MAJOR problems 
with XFS but you bet that I don't think it's 100% safe in a properly 
configured environment. But that does not mean I am saying one must 
always encounter issues with it.

Redhat not choosing XFS is because the thing's code base is a quagmire 
and they had no developer familiar with it. Only Suse supported it 
because they could since they had XFS developers on their payroll and 
those developers were kept busy if you ask me.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Tuesday, April 12, 2011 02:56:54 PM rai...@ultra-secure.de wrote:
...
  Steve,
  I'm managing machines with 30TB of storage for more then two years. And
  with
  good reporting and reaction we have never had to run fsck.
 
 That's not the issue.
 The issue is rebuild-time.
 The longer it takes, the more likely is another failure in the array.
 With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
 assume it will further decrease overall-performance and the rebuild-time
 will go up significantly - adding the the risk.

While I do concede the obvious point regarding rebuild time (raid6 takes from 
long to very long to rebuild) I'd like to point out:

 * If you do the math for a 12 drive raid10 vs raid6 then (using actual data 
from ~500 1T drives on HP cciss controllers during two years) raid10 is ~3x 
more likely to cause hard data loss than raid6.

 * mtbf is not everything there's also the thing called unrecoverable read 
errors. If you hit one while rebuilding your raid10 you're toast while in the 
raid6 case you'll use your 2nd parity and continue the rebuild.

/Peter (who runs many 12 drive raid6 systems just fine)

 Thus, it's generally advisable to do just use RAID10 (in this case, a
 thin-striped array of RAID1-arrays).


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Wednesday, April 13, 2011 09:29:29 AM Matthew Feinberg wrote:
 Thank you everyone for the advice and great information. From what I am
 gathering XFS is the way to go.
 
 A couple more questions.
 What partitioning utility is suggested? parted and fdisk do not seem to
 be doing the job.

My suggestion is don't partition at all, use LVM.

 Raid Level. I am considering moving away from the raid6 due to possible
 write performance issues. The array is 22 disks. I am not opposed to
 going with raid10 but I am looking for a good balance of
 performance/capacity.

Then try both for your use case and your hardware. We have wide raid6 setups 
that does well over 500 MB/s write (that is: not all raid6 writes suck...).

/Peter
 
 Hardware or software raid. Is there an advantage either way on such a
 large array?


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Lamar Owen
On Thursday, April 14, 2011 10:37:15 AM Christopher Chan wrote:
 I used XFS extensively when I was running mail server farms for 
 the mail queue filesystem and I only remember one or two incidents when 
 the filesystem was marked read-only for no reason (seemingly - never had 
 the time to find out why) but a reboot fixed those. 

I've had that happen, recently, with ext3 on CentOS 4.

FWIW.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 10:54 PM, Lamar Owen wrote:
 On Thursday, April 14, 2011 10:37:15 AM Christopher Chan wrote:
 I used XFS extensively when I was running mail server farms for
 the mail queue filesystem and I only remember one or two incidents when
 the filesystem was marked read-only for no reason (seemingly - never had
 the time to find out why) but a reboot fixed those.

 I've had that happen, recently, with ext3 on CentOS 4.

 FWIW.

I wonder if there were any changes to the ext3 code in the CentOS 4 
kernel lately...
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Christopher Chan
On Thursday, April 14, 2011 10:47 PM, Peter Kjellström wrote:
 On Wednesday, April 13, 2011 09:29:29 AM Matthew Feinberg wrote:
 Thank you everyone for the advice and great information. From what I am
 gathering XFS is the way to go.

 A couple more questions.
 What partitioning utility is suggested? parted and fdisk do not seem to
 be doing the job.

 My suggestion is don't partition at all, use LVM.

 Raid Level. I am considering moving away from the raid6 due to possible
 write performance issues. The array is 22 disks. I am not opposed to
 going with raid10 but I am looking for a good balance of
 performance/capacity.

 Then try both for your use case and your hardware. We have wide raid6 setups
 that does well over 500 MB/s write (that is: not all raid6 writes suck...).


/me replaces all of Peter's cache with 64MB modules.

Let's try again.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Les Mikesell
On 4/14/2011 9:54 AM, Lamar Owen wrote:
 On Thursday, April 14, 2011 10:37:15 AM Christopher Chan wrote:
 I used XFS extensively when I was running mail server farms for
 the mail queue filesystem and I only remember one or two incidents when
 the filesystem was marked read-only for no reason (seemingly - never had
 the time to find out why) but a reboot fixed those.

 I've had that happen, recently, with ext3 on CentOS 4.

Same here, CentOS5 and ext3.  Rare and random across identical hardware. 
  So far I've blamed the hardware.

-- 
   Les Mikesell
lesmikes...@gmail.com


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Ross Walker
2011/4/14 Peter Kjellström c...@nsc.liu.se:
 On Tuesday, April 12, 2011 02:56:54 PM rai...@ultra-secure.de wrote:
 ...
  Steve,
  I'm managing machines with 30TB of storage for more then two years. And
  with
  good reporting and reaction we have never had to run fsck.

 That's not the issue.
 The issue is rebuild-time.
 The longer it takes, the more likely is another failure in the array.
 With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
 assume it will further decrease overall-performance and the rebuild-time
 will go up significantly - adding the the risk.

 While I do concede the obvious point regarding rebuild time (raid6 takes from
 long to very long to rebuild) I'd like to point out:

  * If you do the math for a 12 drive raid10 vs raid6 then (using actual data
 from ~500 1T drives on HP cciss controllers during two years) raid10 is ~3x
 more likely to cause hard data loss than raid6.

  * mtbf is not everything there's also the thing called unrecoverable read
 errors. If you hit one while rebuilding your raid10 you're toast while in the
 raid6 case you'll use your 2nd parity and continue the rebuild.

You mean if the other side of the mirror fails while rebuilding it.
Yes this is true, of course if this happens with RAID6 it will rebuild
from parity IF there is a second hotspare available, cause remember
the first failure wasn't cleared before the second failure occurred.
Now your RAID6 is in severe degraded state, one more failure before
either of these disks is rebuilt will mean toast for the array. Now
the performance of the array is practically unusable and the load on
the disks is high as it does a full recalculation rebuild, and if they
are large it will be high for a very long time, now if any other disk
in the very large RAID6 array is near failure, or has a bad sector,
this taxing load could very well push it over the edge and the risk of
such an event occurring increases with the size of the array and the
size of the disk surface.

I think this is where the mdraid raid10 shines because it can have 3
copies (or more) of the data instead of just two, of course a three
times (or more) the cost. It also allows for uneven number of disks as
it just saves copies on different spindles rather then mirrors. This
I think provides the best protection against failure and the best
performance, but at the worst cost, but with 2TB and 4TB disks coming
out it may very well be worth it as the cost per-GB drives lower and
lower and one can get 12TB of raw storage out of only 4 platters,
imagine 12 platters, I wouldn't mind getting 16TB out of 48TB of raw
if it costs me less then what 16TB of raw cost me just 2 years ago,
especially if it means I get both performance and reliability.

 /Peter (who runs many 12 drive raid6 systems just fine)

 Thus, it's generally advisable to do just use RAID10 (in this case, a
 thin-striped array of RAID1-arrays).

It is not advisable to use any level of RAID.

The RAID level is determined by the needs of the application vs the
risks of the RAID level vs the risks of the storage technology.

-Ross
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Benjamin Franz
On 04/14/2011 08:04 AM, Christopher Chan wrote:

 Then try both for your use case and your hardware. We have wide raid6 setups
 that does well over 500 MB/s write (that is: not all raid6 writes suck...).

 /me replaces all of Peter's cache with 64MB modules.

 Let's try again.

If you are trying to imply that RAID6 can't go fast when write size is 
larger than the cache, you are simply wrong. Even with just a 8 x RAID6, 
I've tested a system as sustained sequential (not burst) 156Mbytes/s out 
and 387 Mbytes/s in using 7200 rpm 1.5 TB drives. Bonnie++ results 
attached. Bonnie++ by default uses twice as much data as your available 
RAM to make sure you aren't just seeing cache. IOW: That machine only 
had 4GB of RAM and 256 MB of controller cache during the test but wrote 
and read 8 GB of data for the tests.

Version  1.96   --Sequential Output-- --Sequential Input- 
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
8G   248  99 155996  74 85600  42   961  99 386900  62 
628.3  29
Latency 33323us 224ms1105ms   19047us   77599us 
113ms
Version  1.96   --Sequential Create-- Random 
Create
   -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
  16 17395  56 + +++ 23951  61 27125  84 + +++ 
32154  84
Latency   330us 993us 980us 344us  64us  
80us

-- 
Benjamin Franz
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Les Mikesell
On 4/14/2011 7:32 AM, Christopher Chan wrote:

 HAHAHAAAAHA

 The XFS codebase is the biggest pile of mess in the Linux kernel and you
 expect it to be not run into mysterious problems? Remember, XFS was
 PORTED over to Linux. It is not a 'native' thing to Linux.

Well yeah, but the way I remember it, SGI was using it for real work 
like video editing and storing zillions of files back when Linux was a 
toy with a 2 gig file size limit and linear directory scans as the only 
option.   If you mean that the Linux side had a not-invented-here 
attitude about it and did the port badly you might be right...

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Thursday, April 14, 2011 04:13:19 PM Steve Brooks wrote:
 On Thu, 14 Apr 2011, Peter Kjellström wrote:
  On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:
  OTOH, gparted doesn't see my software raid array either. Gparted it
  rather practical for regular plain vanilla partitions, but for more
  advanced stuff and filesystems, fdisk is probably better.
  
   For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.
  
  Even better, use LVM and stay away from partitioning completely.
 
 Is it not ok to build the filesystem straight onto the device and not
 bother with patitioning at all.

Of course it is ok. The default install will put your filesystem directly on 
LVs (of course). If you're referring to PVs directly on a scsi device then 
that is fine to.

/Peter


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Thursday, April 14, 2011 04:15:10 PM Sorin Srbu wrote:
 -Original Message-
 From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On
 Behalf Of Peter Kjellström
 Sent: Thursday, April 14, 2011 3:31 PM
 To: centos@centos.org
 Subject: Re: [CentOS] 40TB File System Recommendations
 
 On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:
   OTOH, gparted doesn't see my software raid array either. Gparted it
   rather practical for regular plain vanilla partitions, but for more
   advanced stuff and filesystems, fdisk is probably better.
   
   For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.
 
 Even better, use LVM and stay away from partitioning completely.
 
 Wasn't there some gotchas' regarding LVM's and ext4?? I might be imagining
 things things...

If you can't find anything better than I might be imagining things then 
maybe, for the sake of singnal to noise, either don't contribute or chase down 
some additional data first...

Then again, the thread is about 40T fs size and ext4 tops out at 16T at the 
moment so...

/Peter


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Ross Walker
2011/4/14 Peter Kjellström c...@nsc.liu.se:
 On Thursday, April 14, 2011 04:13:19 PM Steve Brooks wrote:
 On Thu, 14 Apr 2011, Peter Kjellström wrote:
  On Tuesday, April 12, 2011 03:10:33 PM Lars Hecking wrote:
  OTOH, gparted doesn't see my software raid array either. Gparted it
  rather practical for regular plain vanilla partitions, but for more
  advanced stuff and filesystems, fdisk is probably better.
 
   For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.
 
  Even better, use LVM and stay away from partitioning completely.

 Is it not ok to build the filesystem straight onto the device and not
 bother with patitioning at all.

 Of course it is ok. The default install will put your filesystem directly on
 LVs (of course). If you're referring to PVs directly on a scsi device then
 that is fine to.

The only real reason to put PVs within partitions is to prevent
accidental clobbering of data as fdisk/sfdisk don't see the LVM
metadata, but if you have a partition marked as type LVM then it will
see that.

If the disk is really big though I don't think you need to worry too
much as it's hard to mistake your 20TB disk for another and most
fdisk/sfdisk implementations will refuse to operate on disks that
large.

If it's a concern though use gparted and GPT partition table. Check
your LV alignments in any case, whole disk or partition, and muck with
the PV metadata to get that first LV on sector 2048.

-Ross
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Peter Kjellström
On Thursday, April 14, 2011 04:54:34 PM Lamar Owen wrote:
 On Thursday, April 14, 2011 10:37:15 AM Christopher Chan wrote:
  I used XFS extensively when I was running mail server farms for
  the mail queue filesystem and I only remember one or two incidents when
  the filesystem was marked read-only for no reason (seemingly - never had
  the time to find out why) but a reboot fixed those.
 
 I've had that happen, recently, with ext3 on CentOS 4.

The default behaviour for ext3 on CentOS-5 is to remount read-only, as a 
safety measure, when something goes wrong beneath it (see mount option 
errors in man mount). The root cause can be any of a long list of hardware 
or software (kernel) problems (typically not ext3's fault though).

/Peter


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Lamar Owen
On Thursday, April 14, 2011 11:20:23 AM Les Mikesell wrote:
 Same here, CentOS5 and ext3.  Rare and random across identical hardware. 
   So far I've blamed the hardware.

I don't have that luxury.  This is one VM on a VMware ESX 3.5U5 host, and the 
storage is EMC Clariion fibre-channel, with the VMware VMFS3 in between.  Same 
storage RAID groups serve other VMs that haven't shown the problem.  Happened 
regardless of the ESX host on which the guest was running; I even svmotioned 
the vmx/vmdk over to a different RAID group, and after roughly two weeks it did 
it again.

I haven't had the issue since the 4.9 update, and transitioning from the 
all-in-one vmware-tools package to the OSP stuff at packages.vmware.com (did 
that for a different reason, that of the 'can't reboot/restart vmxnet if IPv6 
enabled' issue on ESX 3.5).

Only the one VM guest had the problem; several C4 VM's, too.  This one has the 
Scalix mailstore on it.  Reboot into single user, disable the journal, fsck, 
re-enable the journal, things are ok.  Well, the last time it happened I didn't 
disable the journal before the fsck/reboot, but didn't suffer any data loss 
even then (journal replay in the 'fs went read-only journal stopped' case isn't 
something you want to have happen in the general case).
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread m . roth
Lamar Owen wrote:
 On Thursday, April 14, 2011 10:37:15 AM Christopher Chan wrote:
 I used XFS extensively when I was running mail server farms for the
mail queue filesystem and I only remember one or two incidents when the
filesystem was marked read-only for no reason (seemingly - never had
the time to find out why) but a reboot fixed those.

 I've had that happen, recently, with ext3 on CentOS 4.

I've had that happen, also. It usually indicates a drive dying.

 mark



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread aurfalien
On Apr 14, 2011, at 6:43 AM, Ross Walker wrote:

 On Apr 14, 2011, at 6:54 AM, John Jasen jja...@realityfailure.org  
 wrote:

 On 04/13/2011 09:04 PM, Ross Walker wrote:
 On Apr 13, 2011, at 7:26 PM, John Jasen  
 jja...@realityfailure.org wrote:

 snipped my stuff


 Every now and then I hear these XFS horror stories. They seem too  
 impossible to believe.

 Nothing breaks for absolutely no reason and failure to know where  
 the breakage was shows that maybe there wasn't adequately skilled  
 techinicians for the technology deployed.

 Waving your hands and insulting the people who went through XFS  
 failures
 doesn't make me feel any better or make the problems not have  
 occurred.
 W
 You are correct it came across as rude and condescending, I apologize.

 It was a knee jerk reaction that came from reading many such posts  
 that XFS is no good because it caused X where X came about because  
 people didn't know how to implement XFS safely or correctly.

Well, while a fan of anything IRIX, I've had issues with XFS in the  
past as with all filesystems.

I still use it but not in all cases.

A good fs, fast, reliable for the most part but by no means a fan boi  
of it.

You did come across as a serious fan though.

However if you like XFS, I'll assume you liek IRIX so check the 5dwm  
project which is the IRIX desktop for Linux.

- aurf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Lamar Owen
On Thursday, April 14, 2011 02:17:41 PM aurfal...@gmail.com wrote:
 However if you like XFS, I'll assume you liek IRIX so check the 5dwm  
 project which is the IRIX desktop for Linux.

Cool.  Now if they ported the Audio DAT ripping program for IRIX to Linux, I'd 
be able to get rid of my O2. (SGI got special DAT tape drive firmware made 
by Seagate that can read and write Audio DAT tapes in a particular 
Seagate/Archive Python DDS-1 drive; SGI also put the software wot work with 
Audio DAT's in IRIX.  I use that program occasionally on my O2, and previously 
on my Indigo2/IMPACT, to 'rip' Audio DAT's for my professional audio production 
side business I can also master to Audio DAT with the same program, making 
it quite nice indeed.).
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread aurfalien
On Apr 14, 2011, at 12:43 PM, Lamar Owen wrote:

 On Thursday, April 14, 2011 02:17:41 PM aurfal...@gmail.com wrote:
 However if you like XFS, I'll assume you liek IRIX so check the 5dwm
 project which is the IRIX desktop for Linux.

 Cool.  Now if they ported the Audio DAT ripping program for IRIX to  
 Linux, I'd be able to get rid of my O2. (SGI got special DAT  
 tape drive firmware made by Seagate that can read and write Audio  
 DAT tapes in a particular Seagate/Archive Python DDS-1 drive; SGI  
 also put the software wot work with Audio DAT's in IRIX.  I use that  
 program occasionally on my O2, and previously on my Indigo2/IMPACT,  
 to 'rip' Audio DAT's for my professional audio production side  
 business I can also master to Audio DAT with the same program,  
 making it quite nice indeed.).

Dude, thats killer.

I miss the SGI/Irix dayz.

Solid hardware/OS for sure.

Can you believe that in the early 90s they had market share for  
desktop Unix boxes.

- aurf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread m . roth
aurfal...@gmail.com wrote:
 On Apr 14, 2011, at 12:43 PM, Lamar Owen wrote:

 On Thursday, April 14, 2011 02:17:41 PM aurfal...@gmail.com wrote:
 However if you like XFS, I'll assume you liek IRIX so check the 5dwm
 project which is the IRIX desktop for Linux.

 Cool.  Now if they ported the Audio DAT ripping program for IRIX to
 Linux, I'd be able to get rid of my O2. (SGI got special DAT
snip
 Dude, thats killer.

 I miss the SGI/Irix dayz.

 Solid hardware/OS for sure.

 Can you believe that in the early 90s they had market share for
 desktop Unix boxes.

Yeah, I liked SGI's and Irix; liked Suns and Solaris. Sun, er, Oracle,
now? HELL, NO!!!

 mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread Pasi Kärkkäinen
On Wed, Apr 13, 2011 at 07:18:23PM -0400, John Jasen wrote:
 On 04/12/2011 11:30 AM, Les Mikesell wrote:
  On 4/12/2011 9:36 AM, John Jasen wrote:
 
  snipped: two recommendations for XFS
 
  I would chime in with a dis-commendation for XFS. At my previous
  employer, two cases involving XFS resulted in irrecoverable data
  corruption. These were on RAID systems running from 4 to 20 TB.
  
  Was this on a 32 or 64 bit system?
  
 
 Yes. IE: both.
 

XFS is known to be broken on 32bit Linux..

XFS was originally developed on 64bit IRIX (iirc),
so it also requires 64bit Linux.

32bit Linux has too small stack for XFS.
Redhat only supports XFS on x86_64 RHEL.

-- Pasi

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-14 Thread John Jasen
One was 32 bit, the other 64 bit.



Christopher Chan christopher.c...@bradbury.edu.hk wrote:

On Thursday, April 14, 2011 07:26 AM, John Jasen wrote:
 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr   wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



 What were those circumstances? Crash? Power outage? What are the
 components of the RAID systems?

 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.

 Second was multiple hardware arrays over linux md raid0, also over fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

32-bit kernel by any chance?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Emmanuel Noobadmin
On 4/13/11, Brandon Ooi brand...@gmail.com wrote:

 centos 5 can expand raid 0/1/5. just not 6. 10 is just layered 0/1 so you
 can expand it.
 centos 6 will be able to expand raid6 as it was a feature in 2.6.20 or
 something.

This is where I'm getting confused. I had been reading up on mdadm,
torn between using RAID 5/6 for the ability to grow the array with
more disks and RAID 10 for better IOPS. The man pages itself says that
Currently  supported  growth options include changing the active size
of component devices and  changing  the  number  of  active devices
in RAID levels 1/4/5/6,

Along with other internet sources seems to imply that growing RAID 0
is not supported, and by therefore by extension neither is RAID 10.
Furthermore, I read on Neil Brown's blog that reshaping RAID 10 was a
planned but not implemented feature.

Is the difference here between using mdadm to directly create a RAID
10 vs manually layering on RAID 0 on RAID 1 devices?

Or is the expansion here limited to replacing the existing components
drive with larger ones, e.g. replacing four 1TB drives with four 2TB
drives so going from a 2TB to 4TB array?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Rudi Ahlers
On Wed, Apr 13, 2011 at 6:35 AM, Emmanuel Noobadmin
centos.ad...@gmail.comwrote:

 On 4/12/11, Rudi Ahlers r...@softdux.com wrote:
  But, our RAID10 is setup as a stripe of mirrors, i.e. sda1  sdb1 - md0,
  sdc1 + sdd1 -md1, then sde1 + sdf1 -md2, and finally md0 + md1 + md2
 are
  stripped. The advantage of this is that we can add more disks to the
 whole
  RAID set with no downtime

 Off-topic, but when you say add more disks, do you mean for the
 purpose of replacing failing disks or for expanding the array? I'm
 curious because on initial reading I read it to mean expanding the
 storage capacity of the array but thought it was currently not
 possible to expand a mdadm RAID 0 non-destructively.
 ___

 to expand the array :)

I haven't had problems doing it this way yet.

The other way is to run LVM on top of the three md's, i.e pvcreate volume01
/dev/md0 /dev/md1 /dev/md2 etc. LVM expands very easily with no downtime
either.




-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Matthew Feinberg

Thank you everyone for the advice and great information. From what I am 
gathering XFS is the way to go.

A couple more questions.
What partitioning utility is suggested? parted and fdisk do not seem to 
be doing the job.

Raid Level. I am considering moving away from the raid6 due to possible 
write performance issues. The array is 22 disks. I am not opposed to 
going with raid10 but I am looking for a good balance of 
performance/capacity.

Hardware or software raid. Is there an advantage either way on such a 
large array?



-- 
Matthew Feinberg
matt...@choopa.com
AIM: matthewchoopa

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Matthew Feinberg
Sent: Wednesday, April 13, 2011 9:29 AM
To: CentOS mailing list
Subject: Re: [CentOS] 40TB File System Recommendations

Hardware or software raid. Is there an advantage either way on such a
large array?

Doesn't that depend on what sort of backup solution you're planning, and the
level of criticalness of the backups saved?

Some say that for more serious raid solutions, hardware is the way to go, while
software raids are sort of a middle-road.

Me, I usually go with software raid. I've had one too many hardware raid
failures where I haven't been able to restore the data contained. With software
raid a restore has always worked fine for me, especially broken raids in
Windows. While raid in Windows isn't overly performance-inclined, I've come to
appreciate the software ditto in linux - both performance and stability is
top-notch IMHO.

With today's CPU-performance and RAM available, software raids are not a problem
to power.

-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Emmanuel Noobadmin
On 4/13/11, Rudi Ahlers r...@softdux.com wrote:
 I haven't had problems doing it this way yet.

Thanks for the confirmation. Could you please outline the general
steps to expand an existing RAID 10 with another RAID 1 device?

I'm trying to test this out but unfortunately being the noob that I
am, all I have managed so far is a couple of /dev/loop raid 1 arrays
that cannot be deleted nor combined into a raid 0 array.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Christopher Chan
On Wednesday, April 13, 2011 04:00 PM, Sorin Srbu wrote:

 With today's CPU-performance and RAM available, software raids are not a 
 problem
 to power.


That depends. Software raid is fine for raid1 and raid0. If you want 
raid5 or raid6, you have to use hardware raid with bbu cache that 
matches the size of the array notwithstanding the limiting to 10 disks 
max. consideration.

cpu performance/amount of RAM available is a non-issue and has been for 
a decade.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Ross Walker
On Apr 13, 2011, at 8:45 AM, Christopher Chan 
christopher.c...@bradbury.edu.hk wrote:

 On Wednesday, April 13, 2011 04:00 PM, Sorin Srbu wrote:
 
 With today's CPU-performance and RAM available, software raids are not a 
 problem
 to power.
 
 
 That depends. Software raid is fine for raid1 and raid0. If you want 
 raid5 or raid6, you have to use hardware raid with bbu cache that 
 matches the size of the array notwithstanding the limiting to 10 disks 
 max. consideration.
 
 cpu performance/amount of RAM available is a non-issue and has been for 
 a decade.

The battery backed cache is essential in avoiding the parity write hole as well 
as avoiding the performance penalty of short writes, those less then the stripe 
width  where the remaining chunks need to be read to calculate the new parity, 
as the cache can attempt to cache the write until it gets a full stripe width 
and/or cache future writes until the read-calc-write is completed.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Lamar Owen
On Tuesday, April 12, 2011 06:49:08 PM Drew wrote:
  Where can I get an enterprise-class 2TB drive for $100?  Commodity SATA 
  isn't enterprise-class. 

 I can get Seagate's Constellation ES series SATA drives in 1TB for
 $125. 2TB will run me around $225.

Yeah, those are reasonable near-line drives for archival storage, or when you 
have a very small number of servers accessing the storage, and large amounts of 
cache.

EMC used Barracuda ES SATA drives in their Clariion CX3 boxes for a while; used 
a dual attach 4G FC bridge controller to go from the DAE backplane to the SATA 
port, and emulated the dual attach functionality of FC with it.  I'm not 100% 
sure, but I think the SATA drive itself got EMC-specific firmware.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Christopher Chan
On Wednesday, April 13, 2011 09:18 PM, Ross Walker wrote:
 On Apr 13, 2011, at 8:45 AM, Christopher 
 Chanchristopher.c...@bradbury.edu.hk  wrote:

 On Wednesday, April 13, 2011 04:00 PM, Sorin Srbu wrote:

 With today's CPU-performance and RAM available, software raids are not a 
 problem
 to power.


 That depends. Software raid is fine for raid1 and raid0. If you want
 raid5 or raid6, you have to use hardware raid with bbu cache that
 matches the size of the array notwithstanding the limiting to 10 disks
 max. consideration.

 cpu performance/amount of RAM available is a non-issue and has been for
 a decade.

 The battery backed cache is essential in avoiding the parity write hole as 
 well as avoiding the performance penalty of short writes, those less then the 
 stripe width  where the remaining chunks need to be read to calculate the new 
 parity, as the cache can attempt to cache the write until it gets a full 
 stripe width and/or cache future writes until the read-calc-write is 
 completed.


While we are at it, disks being directly connected to the raid card will 
mean there won't be bus contention from nics and what not whereas 
software raid 5/6 would have to deal with that.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Lamar Owen
On Tuesday, April 12, 2011 07:00:26 PM compdoc wrote:
 I've had good luck with green, 5400 rpm Samsung drives. They don't spin down
 automatically and work fine in my raid 5 arrays. The cost is about $80 for
 2TB drives.

And that's a good price point for a commodity drive; not something I would 
count on for long-term use, but still a good price point.

 I also have a few 5900 rpm Seagate ST32000542AS drives, but not currently in
 raids. They don't spin down, so I'm sure they would be fine in a raid.

The biggest issue isn't the spindown.  Google 'WDTLER' and see the other, 
bigger, issue.  In a nutshell, TLER (Time-Limited Error Recovery; see 
https://secure.wikimedia.org/wikipedia/en/wiki/TLER ) allows the drive to not 
try to recover soft errors quite as long.  The error recovery time can cause 
the drive to drop out of RAID sets and be marked as faulted.

 Just because they are so tiny on the outside, 2.5 inch drives like the
 Seagate Constellation and WD Raptors are great. Unfortunately, the don't
 come any larger than 1TB, so I use them in special situations.

FWIW, EMC's new VNX storage systems are at the 2.5 inch formfactor, with SSD 
and mechanical platter drives as options, using 6G SAS interfaces.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Christopher Chan
Sent: Wednesday, April 13, 2011 3:45 PM
To: centos@centos.org
Subject: Re: [CentOS] 40TB File System Recommendations

While we are at it, disks being directly connected to the raid card will
mean there won't be bus contention from nics and what not whereas
software raid 5/6 would have to deal with that.

Could that really be an issue as well? What kind of traffic levels are we
speaking of now? Approximately? That is to say, in as much this can be
quantified at all.

I've never really seen this problem.
-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Christopher Chan
On Wednesday, April 13, 2011 10:32 PM, Sorin Srbu wrote:
 -Original Message-
 From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
 Of Christopher Chan
 Sent: Wednesday, April 13, 2011 3:45 PM
 To: centos@centos.org
 Subject: Re: [CentOS] 40TB File System Recommendations

 While we are at it, disks being directly connected to the raid card will
 mean there won't be bus contention from nics and what not whereas
 software raid 5/6 would have to deal with that.

 Could that really be an issue as well? What kind of traffic levels are we
 speaking of now? Approximately? That is to say, in as much this can be
 quantified at all.

 I've never really seen this problem.

Oh yeah, we are on PCIe and NUMA architectures now. I guess this point 
no longer applies just like hardware raid being crap no longer applies 
because they are not underpowered i960/tiny cache boards anymore.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread compdoc
 The biggest issue isn't the spindown.  Google 'WDTLER' and see the other,
bigger, issue.  In a nutshell, TLER (Time-Limited Error Recovery; see
https://secure.wikimedia.org/wikipedia/en/wiki/TLER ) allows the drive to
not try to recover soft errors quite as long.  The error recovery time can
cause the drive to drop out of RAID sets and be marked as faulted.

Yes, I'm aware of that and it's the reason I have to replace drives
developing reallocated sectors: they get dropped by my 3ware controllers.
There's a penalty for using cheap drives, but there's also a benefit from
the low heat and power savings.

To me, drives and power supplies are a consumable item - something you're
going to have to replace from time to time. I'm used to it since I service
computers for a living. I've seen enterprise drives fail too, although
probably not as often.

By the way, I'm seeing too many ppl with failing SSD's to start relying on
those yet. I own one so far, but it's not used much.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread John Jasen
On 04/12/2011 11:30 AM, Les Mikesell wrote:
 On 4/12/2011 9:36 AM, John Jasen wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.
 
 Was this on a 32 or 64 bit system?
 

Yes. IE: both.

-- 
-- John E. Jasen (jja...@realityfailure.org)
-- Deserve Victory. -- Terry Goodkind, Naked Empire
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread John Jasen
On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr  wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.


 
 What were those circumstances? Crash? Power outage? What are the 
 components of the RAID systems?

One was a hardware raid over fibre channel, which silently corrupted
itself. System checked out fine, raid array checked out fine, xfs was
replaced with ext3, and the system ran without issue.

Second was multiple hardware arrays over linux md raid0, also over fibre
channel. This was not so silent corruption, as in xfs would detect it
and lock the filesystem into read-only before it, pardon the pun, truly
fscked itself. Happened two or three times, before we gave up, split up
the raid, and went ext3, Again, no issues.
-- 
-- John E. Jasen (jja...@realityfailure.org)
-- Deserve Victory. -- Terry Goodkind, Naked Empire
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Ross Walker
On Apr 13, 2011, at 7:26 PM, John Jasen jja...@realityfailure.org wrote:

 On 04/12/2011 08:19 PM, Christopher Chan wrote:
 On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr  wrote:
 
 snipped: two recommendations for XFS
 
 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.
 
 
 
 What were those circumstances? Crash? Power outage? What are the 
 components of the RAID systems?
 
 One was a hardware raid over fibre channel, which silently corrupted
 itself. System checked out fine, raid array checked out fine, xfs was
 replaced with ext3, and the system ran without issue.
 
 Second was multiple hardware arrays over linux md raid0, also over fibre
 channel. This was not so silent corruption, as in xfs would detect it
 and lock the filesystem into read-only before it, pardon the pun, truly
 fscked itself. Happened two or three times, before we gave up, split up
 the raid, and went ext3, Again, no issues.

Every now and then I hear these XFS horror stories. They seem too impossible to 
believe.

Nothing breaks for absolutely no reason and failure to know where the breakage 
was shows that maybe there wasn't adequately skilled techinicians for the 
technology deployed.

XFS if run in a properly configured environment will run flawlessly.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Brandon Ooi
On Wed, Apr 13, 2011 at 6:04 PM, Ross Walker rswwal...@gmail.com wrote:

 
  One was a hardware raid over fibre channel, which silently corrupted
  itself. System checked out fine, raid array checked out fine, xfs was
  replaced with ext3, and the system ran without issue.
 
  Second was multiple hardware arrays over linux md raid0, also over fibre
  channel. This was not so silent corruption, as in xfs would detect it
  and lock the filesystem into read-only before it, pardon the pun, truly
  fscked itself. Happened two or three times, before we gave up, split up
  the raid, and went ext3, Again, no issues.

 Every now and then I hear these XFS horror stories. They seem too
 impossible to believe.

 Nothing breaks for absolutely no reason and failure to know where the
 breakage was shows that maybe there wasn't adequately skilled techinicians
 for the technology deployed.

 XFS if run in a properly configured environment will run flawlessly.


That's not entirely true. Even in Centos 5.3(?), we ran into an issue of XFS
running on an md array would lock up for seemingly no reason due to possible
corruption. I've even bookmarked the relevant bug thread for posterity sake
since it caused us so much grief.

https://bugzilla.redhat.com/show_bug.cgi?id=512552
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-13 Thread Ross Walker
On Apr 13, 2011, at 9:40 PM, Brandon Ooi brand...@gmail.com wrote:

 On Wed, Apr 13, 2011 at 6:04 PM, Ross Walker rswwal...@gmail.com wrote:
 
  One was a hardware raid over fibre channel, which silently corrupted
  itself. System checked out fine, raid array checked out fine, xfs was
  replaced with ext3, and the system ran without issue.
 
  Second was multiple hardware arrays over linux md raid0, also over fibre
  channel. This was not so silent corruption, as in xfs would detect it
  and lock the filesystem into read-only before it, pardon the pun, truly
  fscked itself. Happened two or three times, before we gave up, split up
  the raid, and went ext3, Again, no issues.
 
 Every now and then I hear these XFS horror stories. They seem too impossible 
 to believe.
 
 Nothing breaks for absolutely no reason and failure to know where the 
 breakage was shows that maybe there wasn't adequately skilled techinicians 
 for the technology deployed.
 
 XFS if run in a properly configured environment will run flawlessly.
 
 
 That's not entirely true. Even in Centos 5.3(?), we ran into an issue of XFS 
 running on an md array would lock up for seemingly no reason due to possible 
 corruption. I've even bookmarked the relevant bug thread for posterity sake 
 since it caused us so much grief. 
 
 https://bugzilla.redhat.com/show_bug.cgi?id=512552

Once I had ext3 corrupt on a NAS box with a bad controller. Can I not recommend 
using it? Should it have detected or prevented this corruption from occurring? 
Maybe it isn't safe?

For every one bad experience with a given technology there are thousands of 
success stories. All software has bugs and advocacy really shouldn't play a 
part in determining the proper technology, it should be picked for the 
application and by it's merits and as with anything, thoroughly tested before 
put into production.

-Ross


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread John R Pierce
On 04/12/11 12:23 AM, Matthew Feinberg wrote:
 Hello All

 I have a brand spanking new 40TB Hardware Raid6 array

never mind file systems... is that one raid set?do you have any idea 
how LONG rebuilding that is going to take when there are any drive 
hiccups?  or how painfully slow writes will be until its rebuilt?   is 
that something like 22 x 2TB or 16 x 3TB?I'll bet a raid rebuild 
takes nearly a WEEK, maybe even longer..

I am very strongly NOT in favor of raid6, even for nearline bulk backup 
storage.  I would sacrifice  the space and format that as raid10, and 
have at LEAST a couple hot spares too.




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Alain Péan
Le 12/04/2011 09:23, Matthew Feinberg a écrit :
 Hello All

 I have a brand spanking new 40TB Hardware Raid6 array to play around
 with. I am looking for recommendations for which filesystem to use. I am
 trying not to break this up into multiple file systems as we are going
 to use it for backups. Other factors is performance and reliability.

 CentOS 5.6

 array is /dev/sdb

 So here is what I have tried so far
 reiserfs is limited to 16TB
 ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
 support creating ext4 (strange)

 Anyone work with large filesystems like this that have any
 suggestions/recommendations?

Hi Matthew,

I would go for xfs, which is now supported in CentOS. This is what I use 
for a 16 TB storage, with CentOS 5.3 (Rocks Cluster), and it woks fine. 
No problem with lengthy fsck, as with ext3 (which does not support such 
capacities). I did not try yet ext4...

Alain

-- 
==
Alain Péan - LPP/CNRS
Administrateur Système/Réseau
Laboratoire de Physique des Plasmas - UMR 7648
Observatoire de Saint-Maur
4, av de Neptune, Bat. A
94100 Saint-Maur des Fossés
Tel : 01-45-11-42-39 - Fax : 01-48-89-44-33
==

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Bent Terp
On Tue, Apr 12, 2011 at 9:23 AM, Matthew Feinberg matt...@choopa.com wrote:
 Hello All

 I have a brand spanking new 40TB Hardware Raid6 array to play around
 with. I am looking for recommendations for which filesystem to use. I am
 trying not to break this up into multiple file systems as we are going
 to use it for backups. Other factors is performance and reliability.

We've been very happy with XFS, as it allows us to add diskspace
through LVM and grow the filesystem online - we've had to reboot the
server when we add new diskenclosures, but that's not XFS's fault...

BR Bent


 CentOS 5.6

 array is /dev/sdb

 So here is what I have tried so far
 reiserfs is limited to 16TB
 ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
 support creating ext4 (strange)

 Anyone work with large filesystems like this that have any
 suggestions/recommendations?

 --
 Matthew Feinberg
 matt...@choopa.com
 AIM: matthewchoopa

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 10:36:54 Alain Péan wrote:
 Le 12/04/2011 09:23, Matthew Feinberg a écrit :
  Hello All
  
  I have a brand spanking new 40TB Hardware Raid6 array to play around
  with. I am looking for recommendations for which filesystem to use. I am
  trying not to break this up into multiple file systems as we are going
  to use it for backups. Other factors is performance and reliability.
  
  CentOS 5.6
  
  array is /dev/sdb
  
  So here is what I have tried so far
  reiserfs is limited to 16TB
  ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
  support creating ext4 (strange)
  
  Anyone work with large filesystems like this that have any
  suggestions/recommendations?
 
 Hi Matthew,
 
 I would go for xfs, which is now supported in CentOS. This is what I use
 for a 16 TB storage, with CentOS 5.3 (Rocks Cluster), and it woks fine.
 No problem with lengthy fsck, as with ext3 (which does not support such
 capacities). I did not try yet ext4...
 
 Alain

I have Raid6 Arrays with 30TB. We have tested XFS and its write performance 
was really dissapointing. So we looked at Ext4. It is really good for our 
workloads, but it lacks the ability to grow over 16TB. So we crated two 
partitions on the raid with ext4. 

The RAID rebuild time is around 2 days, max 3 if the workload is higher. So I 
presume that for 40TB it will be around 4 days.

Marian
-- 
Best regards,
Marian Marinov


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Steve Brooks

On Tue, 12 Apr 2011, Marian Marinov wrote:


On Tuesday 12 April 2011 10:36:54 Alain Péan wrote:

Le 12/04/2011 09:23, Matthew Feinberg a écrit :

Hello All

I have a brand spanking new 40TB Hardware Raid6 array to play around
with. I am looking for recommendations for which filesystem to use. I am
trying not to break this up into multiple file systems as we are going
to use it for backups. Other factors is performance and reliability.

CentOS 5.6

array is /dev/sdb

So here is what I have tried so far
reiserfs is limited to 16TB
ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
support creating ext4 (strange)

Anyone work with large filesystems like this that have any
suggestions/recommendations?


Hi Matthew,

I would go for xfs, which is now supported in CentOS. This is what I use
for a 16 TB storage, with CentOS 5.3 (Rocks Cluster), and it woks fine.
No problem with lengthy fsck, as with ext3 (which does not support such
capacities). I did not try yet ext4...

Alain


I have Raid6 Arrays with 30TB. We have tested XFS and its write performance
was really dissapointing. So we looked at Ext4. It is really good for our
workloads, but it lacks the ability to grow over 16TB. So we crated two
partitions on the raid with ext4.

The RAID rebuild time is around 2 days, max 3 if the workload is higher. So I
presume that for 40TB it will be around 4 days.

Marian



For interest how much *memory* would you need in your raid management node 
to support fsck on a 40TB array. I imagine it would be very high.


Steve
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Torres, Giovanni (NIH/NINDS) [C]
On Apr 12, 2011, at 3:23 AM, Matthew Feinberg wrote:

ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
support creating ext4 (strange)

The CentOS homepage states that ext4 is now a fully supported filesystem in 5.6.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 15:34:21 Torres, Giovanni (NIH/NINDS) [C] wrote:
 On Apr 12, 2011, at 3:23 AM, Matthew Feinberg wrote:
 
 ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
 support creating ext4 (strange)
 
 The CentOS homepage states that ext4 is now a fully supported filesystem in
 5.6. ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

Steve,
I'm managing machines with 30TB of storage for more then two years. And with 
good reporting and reaction we have never had to run fsck.

However I'm sure that if you have to run fsck on so big file systems, it will 
be fater to rebuild the array from other storage then waiting for a few weeks 
to finish.

On machines like that I use CentOS but I'm pratitioning them before the 
install with a rescue live cd that I have created for me.

Marian


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Torres, Giovanni (NIH/NINDS) [C]
Sent: Tuesday, April 12, 2011 2:34 PM
To: CentOS mailing list
Subject: Re: [CentOS] 40TB File System Recommendations

On Apr 12, 2011, at 3:23 AM, Matthew Feinberg wrote:
ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
support creating ext4 (strange)

The CentOS homepage states that ext4 is now a fully supported filesystem in
5.6.

I finalized an install with CentOS 5.6 yesterday on a machine that will be our
department fileserver. Ext4 seems to work fine on this raid-array.

In what way is ext4 not fully baked on CentOS 5.6?

IIRC, gparted won't be able to manipulate eg ext4 partitions if you don't have
the appropriate ext4 fs-utils installed. I might be wrong though.

OTOH, gparted doesn't see my software raid array either. Gparted it rather
practical for regular plain vanilla partitions, but for more advanced stuff and
filesystems, fdisk is probably better.

My two oere.
-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Rudi Ahlers
On Tue, Apr 12, 2011 at 2:47 PM, Marian Marinov m...@yuhu.biz wrote:

 Steve,
 I'm managing machines with 30TB of storage for more then two years. And with
 good reporting and reaction we have never had to run fsck.

 However I'm sure that if you have to run fsck on so big file systems, it will
 be fater to rebuild the array from other storage then waiting for a few weeks
 to finish.

 On machines like that I use CentOS but I'm pratitioning them before the
 install with a rescue live cd that I have created for me.

 Marian

 ___

As matter of interest, what hardware do you use? i.e. what CPU's, size
of RAM and RAID cards do you use on this size system?

Everyone always recommends to use smaller RAID arrays than one big fat
one. So, I'm interested to know what you use, and how effective it
works. i.e. if that 30TB was actively used by many hosts how does it
cope? Or is it just archival storage?



-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread rainer
 On Tuesday 12 April 2011 15:34:21 Torres, Giovanni (NIH/NINDS) [C] wrote:
 On Apr 12, 2011, at 3:23 AM, Matthew Feinberg wrote:

 ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
 support creating ext4 (strange)

 The CentOS homepage states that ext4 is now a fully supported filesystem
 in
 5.6. ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

 Steve,
 I'm managing machines with 30TB of storage for more then two years. And
 with
 good reporting and reaction we have never had to run fsck.

That's not the issue.
The issue is rebuild-time.
The longer it takes, the more likely is another failure in the array.
With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
assume it will further decrease overall-performance and the rebuild-time
will go up significantly - adding the the risk.
Thus, it's generally advisable to do just use RAID10 (in this case, a
thin-striped array of RAID1-arrays).



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 15:56:54 rai...@ultra-secure.de wrote:
  On Tuesday 12 April 2011 15:34:21 Torres, Giovanni (NIH/NINDS) [C] wrote:
  On Apr 12, 2011, at 3:23 AM, Matthew Feinberg wrote:
  
  ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
  support creating ext4 (strange)
  
  The CentOS homepage states that ext4 is now a fully supported filesystem
  in
  5.6. ___
  CentOS mailing list
  CentOS@centos.org
  http://lists.centos.org/mailman/listinfo/centos
  
  Steve,
  I'm managing machines with 30TB of storage for more then two years. And
  with
  good reporting and reaction we have never had to run fsck.
 
 That's not the issue.
 The issue is rebuild-time.
 The longer it takes, the more likely is another failure in the array.
 With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
 assume it will further decrease overall-performance and the rebuild-time
 will go up significantly - adding the the risk.
 Thus, it's generally advisable to do just use RAID10 (in this case, a
 thin-striped array of RAID1-arrays).
 

Yes... but with such RAID10 solution you get only half of the disk space... so 
from 10 2TB drives you get only 10TB instead of 16TB with RAID6.

Some of us really need the space. Rebuild time(while it is less then 4 days) 
is considered good enough. In my case I'm using these servers for backups and 
the raid rebuilds haven't made any changes to the performance of the backups.

I'm sure that if you use such storage with RAID6 for VMs it wont perform very 
well.

Marian


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Lars Hecking

 OTOH, gparted doesn't see my software raid array either. Gparted it rather
 practical for regular plain vanilla partitions, but for more advanced stuff 
 and
 filesystems, fdisk is probably better.
 
 For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread m . roth
Rudi Ahlers wrote:
 On Tue, Apr 12, 2011 at 2:47 PM, Marian Marinov m...@yuhu.biz wrote:

 I'm managing machines with 30TB of storage for more then two years. And
 with good reporting and reaction we have never had to run fsck.

 However I'm sure that if you have to run fsck on so big file systems, it
 will be fater to rebuild the array from other storage then waiting for
a few
 weeks to finish.
snip
Here's a question: which would be faster on that huge a filesystem: fsck,
or having a second 30TB filesystem, and rsyncing everything over?

  mark

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Sorin Srbu
-Original Message-
From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
Of Lars Hecking
Sent: Tuesday, April 12, 2011 3:11 PM
To: centos@centos.org
Subject: Re: [CentOS] 40TB File System Recommendations


 OTOH, gparted doesn't see my software raid array either. Gparted it rather
 practical for regular plain vanilla partitions, but for more advanced stuff
and
 filesystems, fdisk is probably better.

 For filersystems  2TB, you're better off grabbing a copy of GPT fdisk.

Oh, there are two flavours of fdisk? Didn't know. Thanks.

-- 
/Sorin


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 16:20:22 m.r...@5-cent.us wrote:
 Rudi Ahlers wrote:
  On Tue, Apr 12, 2011 at 2:47 PM, Marian Marinov m...@yuhu.biz wrote:
  I'm managing machines with 30TB of storage for more then two years. And
  with good reporting and reaction we have never had to run fsck.
  
  However I'm sure that if you have to run fsck on so big file systems, it
  will be fater to rebuild the array from other storage then waiting for
 
 a few
 
  weeks to finish.
 
 snip
 Here's a question: which would be faster on that huge a filesystem: fsck,
 or having a second 30TB filesystem, and rsyncing everything over?

For us, it was faster to transfer the information again. At least this was 
during the tests. We have never had to do it for real. 

I guess the time for the fsck depends on the amount of errors that you have. 
If it has to check only the jurnal the fsck will not take long. But i it has 
to do a full check of the FS... an rsync may be faster.

Marian


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Markus Falb
On 12.4.2011 15:02, Marian Marinov wrote:
 On Tuesday 12 April 2011 15:56:54 
 rainer-rnrd0m5o0maboiyizis...@public.gmane.org wrote:

 Yes... but with such RAID10 solution you get only half of the disk space... 
 so 
 from 10 2TB drives you get only 10TB instead of 16TB with RAID6.

From a somewhat theoretical view, this is true for standard raid10 but
Linux md raid10 is much more flexible as I understood it. You could do 2
copys over 2 disks, thats like standard 10. Or you could do 2 copys over
2 or 3 or ... x disks. Or you could do 3 copys over 3 or 4 or ... x
disks. Do the math. See the manpage for md(4) and
http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

However, I have to admit that I have no experience with that but would
like to hear about any disadvantages or if I am mislead. I am just
interested.

-- 
Kind Regards, Markus Falb



signature.asc
Description: OpenPGP digital signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Rudi Ahlers
On Tue, Apr 12, 2011 at 3:48 PM, Markus Falb markus.f...@fasel.at wrote:

 On 12.4.2011 15:02, Marian Marinov wrote:
  On Tuesday 12 April 2011 15:56:54
 rainer-rnrd0m5o0maboiyizis...@public.gmane.org wrote:

  Yes... but with such RAID10 solution you get only half of the disk
 space... so
  from 10 2TB drives you get only 10TB instead of 16TB with RAID6.

 From a somewhat theoretical view, this is true for standard raid10 but
 Linux md raid10 is much more flexible as I understood it. You could do 2
 copys over 2 disks, thats like standard 10. Or you could do 2 copys over
 2 or 3 or ... x disks. Or you could do 3 copys over 3 or 4 or ... x
 disks. Do the math. See the manpage for md(4) and
 http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

 However, I have to admit that I have no experience with that but would
 like to hear about any disadvantages or if I am mislead. I am just
 interested.

 --





We only use RAID 10 (rather 1+0) and never even bothered with RAID6.  And
we've had no data loss in the past 3 years with it yet, on hundreds of
servers.

But, our RAID10 is setup as a stripe of mirrors, i.e. sda1  sdb1 - md0,
sdc1 + sdd1 -md1, then sde1 + sdf1 -md2, and finally md0 + md1 + md2 are
stripped. The advantage of this is that we can add more disks to the whole
RAID set with no downtime (all server have hot swap HDD cages) and very
little performance degradation since the 2 new drives  have to
be mirrored on their own first (take very little CPU / RAM resources) and
then added to the RAID set. Rebuild is generally quick since it only
rebuilds the broken mirror


-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 16:48:14 Markus Falb wrote:
 On 12.4.2011 15:02, Marian Marinov wrote:
  On Tuesday 12 April 2011 15:56:54
  rainer-rnrd0m5o0maboiyizis...@public.gmane.org wrote:
  
  Yes... but with such RAID10 solution you get only half of the disk
  space... so from 10 2TB drives you get only 10TB instead of 16TB with
  RAID6.
 
 From a somewhat theoretical view, this is true for standard raid10 but
 Linux md raid10 is much more flexible as I understood it. You could do 2
 copys over 2 disks, thats like standard 10. Or you could do 2 copys over
 2 or 3 or ... x disks. Or you could do 3 copys over 3 or 4 or ... x
 disks. Do the math. See the manpage for md(4) and
 http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
 
 However, I have to admit that I have no experience with that but would
 like to hear about any disadvantages or if I am mislead. I am just
 interested.
Its like doing RAID50 or RAID60... Again the cheapest solution is RAID6. 
I really like the software raid in linux, it has good performance. But I have 
never tested it on such big volumes. And usually it is really hard to put 10 
or more drives on a machine without buying a sata controler. 

Marian



signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Boris Epstein
On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan alain.p...@lpp.polytechnique.fr
 wrote:

 Le 12/04/2011 09:23, Matthew Feinberg a écrit :
  Hello All
 
  I have a brand spanking new 40TB Hardware Raid6 array to play around
  with. I am looking for recommendations for which filesystem to use. I am
  trying not to break this up into multiple file systems as we are going
  to use it for backups. Other factors is performance and reliability.
 
  CentOS 5.6
 
  array is /dev/sdb
 
  So here is what I have tried so far
  reiserfs is limited to 16TB
  ext4 does not seem to be fully baked in 5.6 yet. parted 1.8 does not
  support creating ext4 (strange)
 
  Anyone work with large filesystems like this that have any
  suggestions/recommendations?

 Hi Matthew,

 I would go for xfs, which is now supported in CentOS. This is what I use
 for a 16 TB storage, with CentOS 5.3 (Rocks Cluster), and it woks fine.
 No problem with lengthy fsck, as with ext3 (which does not support such
 capacities). I did not try yet ext4...

 Alain

 --
 ==
 Alain Péan - LPP/CNRS
 Administrateur Système/Réseau
 Laboratoire de Physique des Plasmas - UMR 7648
 Observatoire de Saint-Maur
 4, av de Neptune, Bat. A
 94100 Saint-Maur des Fossés
 Tel : 01-45-11-42-39 - Fax : 01-48-89-44-33
 ==

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


I fully second Alain's opinion. An fsck on a 6 TB RAID6 containing about 30
million files takes over 10 hours.

As for XFS, we are running it on a 25 TB array and so far there has been no
trouble.

Boris.

Boris.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread David Miller
On Tue, Apr 12, 2011 at 8:56 AM, rai...@ultra-secure.de wrote:

 That's not the issue.
 The issue is rebuild-time.
 The longer it takes, the more likely is another failure in the array.
 With RAID6, this does not instantly kill your RAID, as with RAID5 - but I
 assume it will further decrease overall-performance and the rebuild-time
 will go up significantly - adding the the risk.
 Thus, it's generally advisable to do just use RAID10 (in this case, a
 thin-striped array of RAID1-arrays).


Statistically speaking that risk isn't there.  RAID6 arrays have a slightly
higher mean time between dataloss than RAID10's.  But the difference here is
very small.  So if you need the capacity and don't mind the performance
difference between these two RAID levels then RAID6 is perfectly fine in my
opinion.

Here's a great blog post on calculating Mean Time Between Dataloss and they
have a spreadsheet that you can download to play with.
http://info.zetta.net/blog/bid/45661/Calculating-Mean-Time-To-Data-Loss-and-probability-of-silent-data-corruption

In my configuration which is 12 drives the chances of a dataloss event over
a 10 year period with RAID10 is 2.51% and with RAID6 is 1.31%.  I would
expect those numbers to go up a bit with 16 drive configuration.

--
David
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread John Jasen
On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr wrote:

snipped: two recommendations for XFS

I would chime in with a dis-commendation for XFS. At my previous
employer, two cases involving XFS resulted in irrecoverable data
corruption. These were on RAID systems running from 4 to 20 TB.


-- 
-- John E. Jasen (jja...@realityfailure.org)
-- Deserve Victory. -- Terry Goodkind, Naked Empire
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Marian Marinov
On Tuesday 12 April 2011 17:36:39 John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
  On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
  alain.p...@lpp.polytechnique.fr
 
  mailto:alain.p...@lpp.polytechnique.fr wrote:
 snipped: two recommendations for XFS
 
 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.

Can someone(who actually knows) share with us, what is the state of xfs-utils, 
how stable and usable are they for recovery of broken XFS filesystems?

Marian


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread James A. Peltier
- Original Message -
| On Tuesday 12 April 2011 17:36:39 John Jasen wrote:
|  On 04/12/2011 10:21 AM, Boris Epstein wrote:
|   On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
|   alain.p...@lpp.polytechnique.fr
| 
|   mailto:alain.p...@lpp.polytechnique.fr wrote:
|  snipped: two recommendations for XFS
| 
|  I would chime in with a dis-commendation for XFS. At my previous
|  employer, two cases involving XFS resulted in irrecoverable data
|  corruption. These were on RAID systems running from 4 to 20 TB.
| 
| Can someone(who actually knows) share with us, what is the state of
| xfs-utils,
| how stable and usable are they for recovery of broken XFS filesystems?
| 
| Marian
| 
| ___
| CentOS mailing list
| CentOS@centos.org
| http://lists.centos.org/mailman/listinfo/centos

On 64-bit platforms the tools are totally stable, but it does depend on the 
degree of broken state that the file system is in.  I've had xfs_checks run 
for days and eat up 96GB of memory because of various degrees of broken-ness. 
 These are on 35 and 45TB file systems.  Be prepared to throw memory at the 
problem or lots of swap files if you get really buggered up.

-- 
James A. Peltier
IT Services - Research Computing Group
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax : 778-782-3045
E-Mail  : jpelt...@sfu.ca
Website : http://www.sfu.ca/itservices
  http://blogs.sfu.ca/people/jpeltier


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Les Mikesell
On 4/12/2011 9:36 AM, John Jasen wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.

Was this on a 32 or 64 bit system?

-- 
   Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Keith Keller
On Tue, Apr 12, 2011 at 06:00:57PM +0300, Marian Marinov wrote:
 
 Can someone(who actually knows) share with us, what is the state of 
 xfs-utils, 
 how stable and usable are they for recovery of broken XFS filesystems?

I have done an XFS repair once or twice on a real filesystem (~4TB) in a
64bit kernel.  It worked fine, but I don't think the filesystem was too
badly thrashed.

As another poster noted, be ready to throw memory or swap at the XFS
check and repair tools.  (I read that it's slightly better memory-wise
to run xfs_repair -n than xfs_check, but I believe that's mainly for
32bit systems, and that may have been fixed anyway.)

--keith


-- 
kkel...@wombat.san-francisco.ca.us



pgpn3duyUFScg.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Pasi Kärkkäinen
On Tue, Apr 12, 2011 at 10:36:39AM -0400, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
  On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
  alain.p...@lpp.polytechnique.fr
  mailto:alain.p...@lpp.polytechnique.fr wrote:
 
 snipped: two recommendations for XFS
 
 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.
 
 
Did you have these problems with XFS on 32bit Linux? 

-- Pasi

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread aurfalien
On Apr 12, 2011, at 12:31 AM, John R Pierce wrote:

 On 04/12/11 12:23 AM, Matthew Feinberg wrote:
 Hello All

 I have a brand spanking new 40TB Hardware Raid6 array

 never mind file systems... is that one raid set?do you have any  
 idea
 how LONG rebuilding that is going to take when there are any drive
 hiccups?  or how painfully slow writes will be until its rebuilt?   is
 that something like 22 x 2TB or 16 x 3TB?I'll bet a raid rebuild
 takes nearly a WEEK, maybe even longer..

 I am very strongly NOT in favor of raid6, even for nearline bulk  
 backup
 storage.  I would sacrifice  the space and format that as raid10, and
 have at LEAST a couple hot spares too.

+1 for the 1+0 and a few hot spares.

Raid 6 + spare ran great but rebuilds took 2 days.  The likely hood of  
2+ failed drives is less then 1 failed drive but I actually had 2  
failed drives so RAID6 + spare saved me.

Hence why I switched to RAID 1+0 + spares.

A tuned XFS fs will work great.

I run my large RAID XFS fs with logbufs=8, and no(atime.dirtime).

I also run iozone for testing my tuned options for optimum performance  
in my env.

- aurf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread John R Pierce
On 04/12/11 6:02 AM, Marian Marinov wrote:

 Yes... but with such RAID10 solution you get only half of the disk space... so
 from 10 2TB drives you get only 10TB instead of 16TB with RAID6.

those disks are $100 each.   whats your data worth?

The rebuild time goes way up as the number of drives in the raid stripe 
goes up.

in this case, the OP is talking about a 40TB array, so thats a TWENTY 
TWO drive raid.  NOONE I know in the storage business will use larger 
than a 8 or 10 drive raid set.   If you really need such a massive 
volume, you stripe several smaller raidsets, so the raid6 version would 
be 2 x 12 x 2TB or 24 drives for raid6+0 == 40TB.

but the OP's application is backup.   for backup, it really doesn't 
matter what the volume size is, more smaller file systems is fine, so 
you can partition your backups by date interval or whatever.

let me throw out another thing.I assume this 40TB backup server is 
not just ONE backup of the current state, but an archive of 
point-in-time backups?  you better have more than one of them, where you 
backup the backup on the 2nd.there's any number of scenarios the 
raid6 won't protect against, including file system corruption, raid 
controller failure where it dumps across a whole stripe, etc.




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Lamar Owen
On Tuesday, April 12, 2011 02:51:45 PM John R Pierce wrote:
 On 04/12/11 6:02 AM, Marian Marinov wrote:
 
  Yes... but with such RAID10 solution you get only half of the disk space... 
  so
  from 10 2TB drives you get only 10TB instead of 16TB with RAID6.
 
 those disks are $100 each.   whats your data worth?

Where can I get an enterprise-class 2TB drive for $100?  Commodity SATA isn't 
enterprise-class.  SAS is; FC is, SCSI is. A 500GB FC drive with EMC firmware 
new is going to set you back ten times that, at least.  What's youre data worth 
indeed, putting it on commodity disk :-)

 in this case, the OP is talking about a 40TB array, so thats a TWENTY 
 TWO drive raid.  NOONE I know in the storage business will use larger 
 than a 8 or 10 drive raid set.   

EMC allows RAID groups up to 16 drives on Clariion storage.  I've been doing 
this with EMC stuff for a while, with RAID6 plus a hotspare per DAE; that's a 
14 drive RAID group plus the hotspare on one DAE.  Some systems I forgo the 
dedicated per-DAE hotspare and spread a 16 drive RAID6 group and a 14 drive 
RAID6 group across two DAE's with hotspares on other DAE's.  Works ok, and I've 
had double drive soft failures on a single RAID6 group that successfully 
hotspared (and back).  This is partially due to the custom EMC firmware on the 
drives, and the interaction with the storage processor.

Rebuild time is several hours, but with more smaller drives it's not too bad.

 If you really need such a massive 
 volume, you stripe several smaller raidsets, so the raid6 version would 
 be 2 x 12 x 2TB or 24 drives for raid6+0 == 40TB.

Or you do metaLUNs, or similar using LVM.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread aurfalien
On Apr 12, 2011, at 1:54 PM, Lamar Owen wrote:

 On Tuesday, April 12, 2011 02:51:45 PM John R Pierce wrote:
 On 04/12/11 6:02 AM, Marian Marinov wrote:

 Yes... but with such RAID10 solution you get only half of the disk  
 space... so
 from 10 2TB drives you get only 10TB instead of 16TB with RAID6.

 those disks are $100 each.   whats your data worth?

 Where can I get an enterprise-class 2TB drive for $100?

This is a good point.

The cheapies are so called green as they spin down often which is not  
what you want in a RAID setup.

While I've been able to tweak this in OSX, I haven't yet tried to see  
what to do in Linux or Winblowz which I will eventually do as some  
turd nugget bought a bunch of these for pro use.

  Commodity SATA isn't enterprise-class.  SAS is; FC is, SCSI is. A  
 500GB FC drive with EMC firmware new is going to set you back ten  
 times that, at least.  What's youre data worth indeed, putting it on  
 commodity disk :-)

 in this case, the OP is talking about a 40TB array, so thats a TWENTY
 TWO drive raid.  NOONE I know in the storage business will use larger
 than a 8 or 10 drive raid set.

 EMC allows RAID groups up to 16 drives on Clariion storage.

Yea, as does BlueArc, unsure of the rest but agreed.

- aurf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Keith Keller
On Tue, Apr 12, 2011 at 02:01:42PM -0700, aurfal...@gmail.com wrote:
 
 The cheapies are so called green as they spin down often which is not  
 what you want in a RAID setup.

The WD RE4-GP is a so-called ''green'' disk that's suitable for RAID
arrays.  It's marketed and priced as an enterprise drive.

--keith

-- 
kkel...@wombat.san-francisco.ca.us



pgpGPGCe08kWM.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread aurfalien
On Apr 12, 2011, at 3:02 PM, Keith Keller wrote:

 On Tue, Apr 12, 2011 at 02:01:42PM -0700, aurfal...@gmail.com wrote:

 The cheapies are so called green as they spin down often which is not
 what you want in a RAID setup.

 The WD RE4-GP is a so-called ''green'' disk that's suitable for RAID
 arrays.  It's marketed and priced as an enterprise drive.

Well, it may either be BS marketing or is so called green for a diff  
reason and not the freq spin downs.

I'm finding green can mean many things from the fact that there  
product packaging is made from recycled material to there  
manufacturing plant no longer uses mercury to there power consumption  
is lower then previous models, etc...

I would say that the $100 price tag would be a caution but then again  
one doesn't always get what one pays for.

Either way, it makes our jobs more challenging for sure.

- aurf


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Drew
 Where can I get an enterprise-class 2TB drive for $100?  Commodity SATA isn't 
 enterprise-class.  SAS is; FC is, SCSI is. A 500GB FC drive with EMC firmware 
 new is going to set you back ten times that, at least.  What's youre data 
 worth indeed, putting it on commodity disk :-)

I can get Seagate's Constellation ES series SATA drives in 1TB for
$125. 2TB will run me around $225.

They're not something I'd run my database off, I have 15k SAS drives
for that, but for large amounts of storage on the cheap like our
backup system, it's just fine.

-- 
Drew

Nothing in life is to be feared. It is only to be understood.
--Marie Curie
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread compdoc
The WD RE4-GP is a so-called ''green'' disk that's suitable for RAID
arrays.  It's marketed and priced as an enterprise drive.

I've had good luck with green, 5400 rpm Samsung drives. They don't spin down
automatically and work fine in my raid 5 arrays. The cost is about $80 for
2TB drives.

I also have a few 5900 rpm Seagate ST32000542AS drives, but not currently in
raids. They don't spin down, so I'm sure they would be fine in a raid.

None of the drives in the raids have failed, although I've replaced a couple
that developed reallocated sectors as reported by smart.

Just because they are so tiny on the outside, 2.5 inch drives like the
Seagate Constellation and WD Raptors are great. Unfortunately, the don't
come any larger than 1TB, so I use them in special situations.




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] 40TB File System Recommendations

2011-04-12 Thread Christopher Chan
On Tuesday, April 12, 2011 10:36 PM, John Jasen wrote:
 On 04/12/2011 10:21 AM, Boris Epstein wrote:
 On Tue, Apr 12, 2011 at 3:36 AM, Alain Péan
 alain.p...@lpp.polytechnique.fr
 mailto:alain.p...@lpp.polytechnique.fr  wrote:

 snipped: two recommendations for XFS

 I would chime in with a dis-commendation for XFS. At my previous
 employer, two cases involving XFS resulted in irrecoverable data
 corruption. These were on RAID systems running from 4 to 20 TB.



What were those circumstances? Crash? Power outage? What are the 
components of the RAID systems?
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


  1   2   >