Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-02-09 Thread Les Mikesell
On Mon, Jan 19, 2015 at 4:53 PM, Charles Polisher cpol...@surewest.net wrote:
 On Jan 07, 2015 at 01:47:53PM -0600, Les Mikesell wrote:

 I see a bunch of entries like:
 ioatdma :00:08.0: Channel halted, chanerr = 2
 ioatdma :00:08.0: Channel halted, chanerr = 0
 in the logs and one of these:
 hrtimer: interrupt took 258633 ns

 Not sure what those mean.   We do have considerably more systems
 running windows than linux on this hardware and I don't think anyone
 has noticed a systemic problem there.

 Was this resolved? The ioatdma messages are from ioat_dma.c, a
 driver for Intel's I/OAT DMA engine typically used on high-end
 server hardware to accelerate network I/O. chanerr = 2 might be
 an issue with the DMA channel being in a suspended state when
 the driver isn't expecting it to be. Maybe a network driver bug.

No, reboots are rare on these servers and file corruption is rare even
within those, so I don't anticipate seeing enough instances to find a
pattern.

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-19 Thread Charles Polisher
On Jan 07, 2015 at 01:47:53PM -0600, Les Mikesell wrote:
 
 I see a bunch of entries like:
 ioatdma :00:08.0: Channel halted, chanerr = 2
 ioatdma :00:08.0: Channel halted, chanerr = 0
 in the logs and one of these:
 hrtimer: interrupt took 258633 ns
 
 Not sure what those mean.   We do have considerably more systems
 running windows than linux on this hardware and I don't think anyone
 has noticed a systemic problem there.

Was this resolved? The ioatdma messages are from ioat_dma.c, a
driver for Intel's I/OAT DMA engine typically used on high-end
server hardware to accelerate network I/O. chanerr = 2 might be
an issue with the DMA channel being in a suspended state when
the driver isn't expecting it to be. Maybe a network driver bug.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 12:10 AM, Keith Keller
kkel...@wombat.san-francisco.ca.us wrote:
 On 2015-01-07, Gordon Messmer gordon.mess...@gmail.com wrote:

 Of course, the other possibility is simply that you've formatted your
 own filesystems, and they have a maximum mount count or a check
 interval.

 If Les is having to run fsck manually, as he wrote in his OP, then this
 is unlikely to be the cause of the issues he described in that post.
 There must be some sort of errors on the filesystem that caused the
 unattended fsck to exit nonzero.


Yes - the unattended fsck fails.   Personally, I'd prefer for the
default run to use '-y' in the first place.  It's not like I'm more
likely than fsck to know how to fix it and it is very inconvenient on
remote machines.   The recent case was an opennms system updating a
lot of rrd files, but I've also seen it on backuppc archives with lots
of files and lots of hard links.  Some of these have been on VMware
ESXi hosts where the physical host wasn't rebooted and the
controller/power not involved at all.  Eventually these will be
replaced with CentOS7 systems, probably using XFS but I don't know if
that will be better or worse.   It is mostly on aging hardware, so it
is possible that there are underlying controller issues.  I also see
some rare cases on similar machines where a filesystem will go
read-only with some scsi errors logged, but didn't look for that yet
in this case.

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 1:37 PM, Gary Greene ggre...@minervanetworks.com wrote:

 Problem is, Gordon, the layer I’m talking about is _below_ the logical layer 
 that filesystems live at, in the block layer, at the mercy of drivers, and 
 firmware that the kernel has zero control over. While in a perfect world, the 
 controller would do strictly only what the Kernel tells it, that just isn’t 
 true for a while now with the large caches that drives and controllers have 
 now.

 In most cases, this should never trigger, however in some buggy drivers, or 
 controllers that have buggy firmware, the writes can be seriously delayed to 
 disk, which can cause data to never make it to the platter.


I'd have to shut one down and get into the bios config to see, but I
think these default to write-through if they aren't battery backed -
caching may not even be an option.   This one might have a battery
going bad, though.

I see a bunch of entries like:
ioatdma :00:08.0: Channel halted, chanerr = 2
ioatdma :00:08.0: Channel halted, chanerr = 0
in the logs and one of these:
hrtimer: interrupt took 258633 ns

Not sure what those mean.   We do have considerably more systems
running windows than linux on this hardware and I don't think anyone
has noticed a systemic problem there.

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread John R Pierce

On 1/7/2015 11:30 AM, Gary Greene wrote:

During the reboot, most card’s drivers on init, will invalidate the cache on 
the card to ensure dirty pages of data don’t get flushed to disk, to prevent 
scribbling junk data to the platters. From what I recall, this is true of both 
the megaraid and adaptec based cards.


Presumably, this cache invalidation is only on cards that don't have 
battery (or flash) backed write cache? Doing that on a BB/FBWC 
system would negate the usefulness of said battery backed cache entirely.


IMHO, an even bigger problem is using cheap desktop class SATA drives 
for server storage.These FREQUENTLY lie about write commits.This 
sort of behavior is a VERY good reason to stick with vendor qualified 
and branded server drives that have been tested to work with the 
specific controller + backplane configurations they are sold with.   And 
yes, those drives cost 2-3X more than your Newegg/Amazon elcheapo 
desktop stuff.


All of this controller and drive behavior is a VERY good argument for 
the use of end to end checksumming like ZFS does...  a ZFS 'scrub' 
operation WILL detect any data corruption on the file system and raid, 
whatever the source, and many inconsistencies can be corrected, such as 
one disk of a mirror having a stale block.


--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread John R Pierce

On 1/7/2015 12:15 PM, m.r...@5-cent.us wrote:

Actually, the WD Reds and similar are just fine.


those are specifically sold for use in small NAS (raid) environments, so 
yeah, they are configured 'correctly'.




--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Gordon Messmer

On 01/07/2015 05:53 AM, Les Mikesell wrote:


Yes - the unattended fsck fails.


In that case, there should be logs indicating the cause of the error 
when it was detected by the kernel.  There's probably something wrong 
with your controller or other hardware.



Personally, I'd prefer for the
default run to use '-y' in the first place.  It's not like I'm more
likely than fsck to know how to fix it and it is very inconvenient on
remote machines.   The recent case was an opennms system updating a
lot of rrd files, but I've also seen it on backuppc archives with lots
of files and lots of hard links.


Every regular file's directory entry on your system is a hard link. 
There's nothing particular about links (files) that make a filesystem 
fragile.



It is mostly on aging hardware, so it
is possible that there are underlying controller issues.  I also see
some rare cases on similar machines where a filesystem will go
read-only with some scsi errors logged, but didn't look for that yet
in this case.


It's probably a similar cause in all cases.  I don't know how many times 
I've seen you on this list defending running old hardware / obsolete 
hardware.  Corruption and failure are more or less what I'd expect if 
your hardware is junk.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 3:30 PM, John R Pierce pie...@hogranch.com wrote:

 Right... but only cost 133% (about) more than consumer drives, as opposed
 to the 300% that the server/enterprise grade drives' cost.


 well, those $$$ drives are likely SAS rather than SATA, and that has other
 advantages...  10k or 15k RPM gives you up to double the IOPS per spindle of
 a 7200rpm SATA drive (and WD Reds are only 5900 RPM, I believe?)...   2.5
 enterprise disks let you have more smaller spindles in the same space (24-25
 per 2U vs 12 for 3.5) for higher IO concurrency, and SAS supports
 multipathing (dual porting) for higher IO bandwidth, also SAS has tagged
 command queueing which often performs better than SATA NCQ under high IO
 concurrency workloads, like database servers.

These particular drives are enterprise SAS versions, but about as old
as they made them.

-- 
  Les Mikesell
lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Gary Greene
 On Jan 7, 2015, at 12:08 PM, John R Pierce pie...@hogranch.com wrote:
 
 On 1/7/2015 11:30 AM, Gary Greene wrote:
 During the reboot, most card’s drivers on init, will invalidate the cache on 
 the card to ensure dirty pages of data don’t get flushed to disk, to prevent 
 scribbling junk data to the platters. From what I recall, this is true of 
 both the megaraid and adaptec based cards.
 
 Presumably, this cache invalidation is only on cards that don't have battery 
 (or flash) backed write cache? Doing that on a BB/FBWC system would 
 negate the usefulness of said battery backed cache entirely.
 


The ones with batteries will try to properly write the content of the cache to 
the disk right before the cache invalidate occurs. This is one of the few times 
when they aren’t lazy in their write patterns.

Regarding cheap vs. enterprise drives, agreed. You should absolutely never 
trust the disks to do the “right” thing with cheap models.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread John R Pierce

On 1/7/2015 12:50 PM, m.r...@5-cent.us wrote:

Right... but only cost 133% (about) more than consumer drives, as opposed
to the 300% that the server/enterprise grade drives' cost.


well, those $$$ drives are likely SAS rather than SATA, and that has 
other advantages...  10k or 15k RPM gives you up to double the IOPS per 
spindle of a 7200rpm SATA drive (and WD Reds are only 5900 RPM, I 
believe?)...   2.5 enterprise disks let you have more smaller spindles 
in the same space (24-25 per 2U vs 12 for 3.5) for higher IO 
concurrency, and SAS supports multipathing (dual porting) for higher IO 
bandwidth, also SAS has tagged command queueing which often performs 
better than SATA NCQ under high IO concurrency workloads, like database 
servers.



--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 10:43 AM, Valeri Galtsev
galt...@kicp.uchicago.edu wrote:

 Not junk - these are mostly IBM 3550/3650 boxes - pretty much top of
 the line in their day (before the M2/3/4 versions),  They have
 Adaptec raid contollers,

 I never had Adaptec in _my_ list of good RAID hardware... But certainly I
 can note be the one to offer judgement on hardware I avoid to the best of
 my ability. If you can afford, I would do the test: replace Adaptec with
 something else (in my list it would be either 3ware or LSI or areca),
 leaving the rest of hardware as it is. And see it the problems continue. I
 do realize that there is more to it than just pulling one card and
 sticking another in its place (that's why I said if you can afford it
 meaning in more general sense, not just monetary).

It's not something happening as a repeatable thing or that I could
consider better/worse after replacing something.  Maybe 3 times a year
across a few hundred machines and generally not repeating on the same
ones. But if there is anything in common it is on very 'active'
filesystems.

-- 
Les Mikesell
   lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 10:15 AM,  m.r...@5-cent.us wrote:

 Yes - the unattended fsck fails.   Personally, I'd prefer for the
 default run to use '-y' in the first place.  It's not like I'm more
 likely than fsck to know how to fix it and it is very inconvenient on
 remote machines.   The recent case was an opennms system updating a
 snip

 In some ways, I prefer the fsck run by reboot to fail - that way, I see
 it, and it most probably tells me that it's time to look at replacing the
 disk.

Seems random to me - not repeating on the same box, and rare enough
that it is hard to make any generalization except that it is painful
to talk some remote helper through the recovery process - usually
involving emailing some cell phone photos of the console to figure out
which partition has the problem.

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Valeri Galtsev

On Wed, January 7, 2015 10:54 am, Les Mikesell wrote:
 On Wed, Jan 7, 2015 at 10:43 AM, Valeri Galtsev
 galt...@kicp.uchicago.edu wrote:

 Not junk - these are mostly IBM 3550/3650 boxes - pretty much top of
 the line in their day (before the M2/3/4 versions),  They have
 Adaptec raid contollers,

 I never had Adaptec in _my_ list of good RAID hardware... But certainly
 I
 can note be the one to offer judgement on hardware I avoid to the best
 of
 my ability. If you can afford, I would do the test: replace Adaptec with
 something else (in my list it would be either 3ware or LSI or areca),
 leaving the rest of hardware as it is. And see it the problems continue.
 I
 do realize that there is more to it than just pulling one card and
 sticking another in its place (that's why I said if you can afford it
 meaning in more general sense, not just monetary).

 It's not something happening as a repeatable thing or that I could
 consider better/worse after replacing something.  Maybe 3 times a year
 across a few hundred machines and generally not repeating on the same
 ones. But if there is anything in common it is on very 'active'
 filesystems.


Too bad... Reminds me one of my 32 node clusters in which one of the nodes
crashed in a crashed once a month (always different node, so probability
of run is 32 Month before crash ;-( Too bad for troubleshooting. Only
after 6 Months I pinpointed particular brand of RAM mixed in into each
node - when I got rid of it, the trouble ended... I would bet on Adaptec
cards in your case... though ideally I shouldn't be offering judgement on
hardware of the brand I almost never use. Good luck!

Valeri


Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Steve Clark

On 01/07/2015 08:53 AM, Les Mikesell wrote:

On Wed, Jan 7, 2015 at 12:10 AM, Keith Keller
kkel...@wombat.san-francisco.ca.us wrote:

On 2015-01-07, Gordon Messmer gordon.mess...@gmail.com wrote:

Of course, the other possibility is simply that you've formatted your
own filesystems, and they have a maximum mount count or a check
interval.

If Les is having to run fsck manually, as he wrote in his OP, then this
is unlikely to be the cause of the issues he described in that post.
There must be some sort of errors on the filesystem that caused the
unattended fsck to exit nonzero.


Yes - the unattended fsck fails.   Personally, I'd prefer for the
default run to use '-y' in the first place.  It's not like I'm more
likely than fsck to know how to fix it and it is very inconvenient on
remote machines.   The recent case was an opennms system updating a
lot of rrd files, but I've also seen it on backuppc archives with lots
of files and lots of hard links.  Some of these have been on VMware
ESXi hosts where the physical host wasn't rebooted and the
controller/power not involved at all.  Eventually these will be
replaced with CentOS7 systems, probably using XFS but I don't know if
that will be better or worse.   It is mostly on aging hardware, so it
is possible that there are underlying controller issues.  I also see
some rare cases on similar machines where a filesystem will go
read-only with some scsi errors logged, but didn't look for that yet
in this case.


I know that I have seen it take 10 ot 15 minutes to sync a 7200 rpm 3 TB WD 
drive that had over
2 million rrd files being updated by ntopng when the system had 32GB of ram. 
The system is a
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz but one cpu will in in constant IO wait 
state until the
sync finishes. I have never tried shutting it down when it was syncing though.

--
Stephen Clark
*NetWolves Managed Services, LLC.*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.cl...@netwolves.com
http://www.netwolves.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Les Mikesell
On Wed, Jan 7, 2015 at 9:52 AM, Gordon Messmer gordon.mess...@gmail.com wrote:

 Every regular file's directory entry on your system is a hard link. There's
 nothing particular about links (files) that make a filesystem fragile.

Agreed, although when there are millions, the fsck fixing it is somewhat slow.

 It is mostly on aging hardware, so it
 is possible that there are underlying controller issues.  I also see
 some rare cases on similar machines where a filesystem will go
 read-only with some scsi errors logged, but didn't look for that yet
 in this case.


 It's probably a similar cause in all cases.  I don't know how many times
 I've seen you on this list defending running old hardware / obsolete
 hardware.  Corruption and failure are more or less what I'd expect if your
 hardware is junk.

Not junk - these are mostly IBM 3550/3650 boxes - pretty much top of
the line in their day (before the M2/3/4 versions),  They have
Adaptec raid contollers, SAS drives, mostly configured as RAID1
mirrors.  I realize that hardware isn't perfect and this is not
happening on a large percentage of them.   But, I don't see anything
that looks like scsi errors in this log and I'm surprised that after
running apparently error-free there would be problems detected after a
software reboot.

I think the newer M2 and later models went to a different RAID
controller, though.   Maybe there was a reason.

-- 
   Les Mikesell
  lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Valeri Galtsev

On Wed, January 7, 2015 10:33 am, Les Mikesell wrote:
 On Wed, Jan 7, 2015 at 9:52 AM, Gordon Messmer gordon.mess...@gmail.com
 wrote:

 Every regular file's directory entry on your system is a hard link.
 There's
 nothing particular about links (files) that make a filesystem fragile.

 Agreed, although when there are millions, the fsck fixing it is somewhat
 slow.

 It is mostly on aging hardware, so it
 is possible that there are underlying controller issues.  I also see
 some rare cases on similar machines where a filesystem will go
 read-only with some scsi errors logged, but didn't look for that yet
 in this case.


 It's probably a similar cause in all cases.  I don't know how many times
 I've seen you on this list defending running old hardware / obsolete
 hardware.  Corruption and failure are more or less what I'd expect if
 your
 hardware is junk.

 Not junk - these are mostly IBM 3550/3650 boxes - pretty much top of
 the line in their day (before the M2/3/4 versions),  They have
 Adaptec raid contollers,

I never had Adaptec in _my_ list of good RAID hardware... But certainly I
can note be the one to offer judgement on hardware I avoid to the best of
my ability. If you can afford, I would do the test: replace Adaptec with
something else (in my list it would be either 3ware or LSI or areca),
leaving the rest of hardware as it is. And see it the problems continue. I
do realize that there is more to it than just pulling one card and
sticking another in its place (that's why I said if you can afford it
meaning in more general sense, not just monetary).

Valeri

 SAS drives, mostly configured as RAID1
 mirrors.  I realize that hardware isn't perfect and this is not
 happening on a large percentage of them.   But, I don't see anything
 that looks like scsi errors in this log and I'm surprised that after
 running apparently error-free there would be problems detected after a
 software reboot.

 I think the newer M2 and later models went to a different RAID
 controller, though.   Maybe there was a reason.

 --
Les Mikesell
   lesmikes...@gmail.com
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos




Valeri Galtsev
Sr System Administrator
Department of Astronomy and Astrophysics
Kavli Institute for Cosmological Physics
University of Chicago
Phone: 773-702-4247

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Gary Greene
 On Jan 6, 2015, at 5:50 PM, Les Mikesell lesmikes...@gmail.com wrote:
 
 On Tue, Jan 6, 2015 at 6:37 PM, Gary Greene ggre...@minervanetworks.com 
 wrote:
 
 
 Almost every controller and drive out there now lies about what is and isn’t 
 flushed to disk, making it nigh on impossible for the Kernel to reliably 
 know 100% of the time that the data HAS been flushed to disk. This is part 
 of the reason why it is always a Good Idea™ to have some sort of pause in 
 the shut down to ensure that it IS flushed.
 
 This is also why server grade gear uses battery backed buffers, etc. which 
 are supposed to allow drives to properly flush the data to disk. There is 
 still a slim chance in these cases that the data still will not reach the 
 platter before power off or reboot, especially in catastrophic cases.
 
 
 This was a reboot from software, not a power drop.  Does that do
 something to kill the disk cache if anything happened to still be
 there?

In most cases intentional reboots _shouldn’t_ trigger this, but I cannot say 
that with a 100% certainty since, again, controllers CAN and DO lie. If the 
controller is not battery backed, the certainty is even more shaky, since the 
card's firmware can be in the process of lazy writing the content to disk when 
the main board drops power to the card's slot on the main board during the 
reboot, which without the extra battery would cause the data to be lost.

During the reboot, most card’s drivers on init, will invalidate the cache on 
the card to ensure dirty pages of data don’t get flushed to disk, to prevent 
scribbling junk data to the platters. From what I recall, this is true of both 
the megaraid and adaptec based cards.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633





___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-07 Thread Gary Greene
 On Jan 6, 2015, at 9:23 PM, Gordon Messmer gordon.mess...@gmail.com wrote:
 
 On 01/06/2015 04:37 PM, Gary Greene wrote:
 This has been discussed to death on various lists, including the
 LKML...
 
 Almost every controller and drive out there now lies about what is
 and isn’t flushed to disk, making it nigh on impossible for the
 Kernel to reliably know 100% of the time that the data HAS been
 flushed to disk. This is part of the reason why it is always a Good
 Idea™ to have some sort of pause in the shut down to ensure that it
 IS flushed.
 
 That's pretty much entirely irrelevant to the original question.
 
 (Feel free to correct me if I'm wrong in the following)
 
 A filesystem has three states: Clean, Dirty, and Dirty with errors.
 
 When a filesystem is unmounted, the cache is flushed and it is marked clean 
 last.  This is the expected state when a filesystem is mounted.
 
 Once a filesystem is mounted read/write, then it is marked dirty.  If a 
 filesystem is dirty when it is mounted, then it wasn't unmounted properly.  
 In the case of a journaled filesystem, typically the journal will be replayed 
 and the filesystem will then be mounted.
 
 The last case, dirty with errors indicates that the kernel found invalid data 
 while the filesystem was mounted, and recorded that fact in the filesystem 
 metadata.  This will normally be the only condition that will force an fsck 
 on boot.  It will also normally result in logs being generated when the 
 errors are encountered.  If your filesystems are force-checked on boot, then 
 the logs should usually tell you why.  It's not a matter of a timeout or some 
 device not flushing its cache.
 
 Of course, the other possibility is simply that you've formatted your own 
 filesystems, and they have a maximum mount count or a check interval.  Use 
 'tune2fs -l' to check those two values.  If either of them are set, then 
 there is no problem with your system.  It is behaving as designed, and 
 forcing a periodic check because that is the default behavior.
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

Problem is, Gordon, the layer I’m talking about is _below_ the logical layer 
that filesystems live at, in the block layer, at the mercy of drivers, and 
firmware that the kernel has zero control over. While in a perfect world, the 
controller would do strictly only what the Kernel tells it, that just isn’t 
true for a while now with the large caches that drives and controllers have now.

In most cases, this should never trigger, however in some buggy drivers, or 
controllers that have buggy firmware, the writes can be seriously delayed to 
disk, which can cause data to never make it to the platter.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633





___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Fran Garcia
On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell  wrote:
 I've had a few systems with a lot of RAM and very busy filesystems
 come up with filesystem errors that took a manual 'fsck -y' after what
 should have been a clean reboot.  This is particularly annoying on
 remote systems where I have to talk someone else through the recovery.

 Is there some time limit on the cache write with a 'reboot' (no
 options) command or is ext4 that fragile?

I'd say there's no limit in the amount of  time the kernel waits until
the blocks have been written to disk; driven by there parameters:

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500

ie, if the data cached on RAM is older than 30s or larger than 10%
available RAM, the kernel will try to flush it to disk. Depending how
much data needs to be flushed at poweroff/reboot time, this could have
a significant effect on the time taken.

Regarding systems with lots of RAM, I've never seen such a behaviour
on a few 192 GB RAM servers I administer. Granted, your system could
be tuned in a different way or have some other configuration.

TBH I'm not confident to give a definitive answer re the data not been
totally flushed before reboot. I'd investigate:

- Whether this happens on every reboot or just on some.
- Whether your RAM is OK (the FS errors could come from that!).
- Whether your disks/SAN are caching writes.  (Maybe they are and the
OS thinks the data has been flushed to disk, but they haven't)
- filesystem mount options that might interfere  (nobarrier, commit, data...)


HTH

~f
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Gary Greene
On Jan 6, 2015, at 4:28 PM, Fran Garcia franchu.gar...@gmail.com wrote:
 
 On Tue, Jan 6, 2015 at 6:12 PM, Les Mikesell  wrote:
 I've had a few systems with a lot of RAM and very busy filesystems
 come up with filesystem errors that took a manual 'fsck -y' after what
 should have been a clean reboot.  This is particularly annoying on
 remote systems where I have to talk someone else through the recovery.
 
 Is there some time limit on the cache write with a 'reboot' (no
 options) command or is ext4 that fragile?
 
 I'd say there's no limit in the amount of  time the kernel waits until
 the blocks have been written to disk; driven by there parameters:
 
 vm.dirty_background_bytes = 0
 vm.dirty_background_ratio = 10
 vm.dirty_bytes = 0
 vm.dirty_expire_centisecs = 3000
 vm.dirty_ratio = 20
 vm.dirty_writeback_centisecs = 500
 
 ie, if the data cached on RAM is older than 30s or larger than 10%
 available RAM, the kernel will try to flush it to disk. Depending how
 much data needs to be flushed at poweroff/reboot time, this could have
 a significant effect on the time taken.
 
 Regarding systems with lots of RAM, I've never seen such a behaviour
 on a few 192 GB RAM servers I administer. Granted, your system could
 be tuned in a different way or have some other configuration.
 
 TBH I'm not confident to give a definitive answer re the data not been
 totally flushed before reboot. I'd investigate:
 
 - Whether this happens on every reboot or just on some.
 - Whether your RAM is OK (the FS errors could come from that!).
 - Whether your disks/SAN are caching writes.  (Maybe they are and the
 OS thinks the data has been flushed to disk, but they haven't)
 - filesystem mount options that might interfere  (nobarrier, commit, data...)

This has been discussed to death on various lists, including the LKML...

Almost every controller and drive out there now lies about what is and isn’t 
flushed to disk, making it nigh on impossible for the Kernel to reliably know 
100% of the time that the data HAS been flushed to disk. This is part of the 
reason why it is always a Good Idea™ to have some sort of pause in the shut 
down to ensure that it IS flushed.

This is also why server grade gear uses battery backed buffers, etc. which are 
supposed to allow drives to properly flush the data to disk. There is still a 
slim chance in these cases that the data still will not reach the platter 
before power off or reboot, especially in catastrophic cases.

--
Gary L. Greene, Jr.
Sr. Systems Administrator
IT Operations
Minerva Networks, Inc.
Cell: +1 (650) 704-6633




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Les Mikesell
On Tue, Jan 6, 2015 at 6:37 PM, Gary Greene ggre...@minervanetworks.com wrote:


 Almost every controller and drive out there now lies about what is and isn’t 
 flushed to disk, making it nigh on impossible for the Kernel to reliably know 
 100% of the time that the data HAS been flushed to disk. This is part of the 
 reason why it is always a Good Idea™ to have some sort of pause in the shut 
 down to ensure that it IS flushed.

 This is also why server grade gear uses battery backed buffers, etc. which 
 are supposed to allow drives to properly flush the data to disk. There is 
 still a slim chance in these cases that the data still will not reach the 
 platter before power off or reboot, especially in catastrophic cases.


This was a reboot from software, not a power drop.  Does that do
something to kill the disk cache if anything happened to still be
there?

-- 
   Les Mikesell
  lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Gordon Messmer

On 01/06/2015 04:37 PM, Gary Greene wrote:

This has been discussed to death on various lists, including the
LKML...

Almost every controller and drive out there now lies about what is
and isn’t flushed to disk, making it nigh on impossible for the
Kernel to reliably know 100% of the time that the data HAS been
flushed to disk. This is part of the reason why it is always a Good
Idea™ to have some sort of pause in the shut down to ensure that it
IS flushed.


That's pretty much entirely irrelevant to the original question.

(Feel free to correct me if I'm wrong in the following)

A filesystem has three states: Clean, Dirty, and Dirty with errors.

When a filesystem is unmounted, the cache is flushed and it is marked 
clean last.  This is the expected state when a filesystem is mounted.


Once a filesystem is mounted read/write, then it is marked dirty.  If a 
filesystem is dirty when it is mounted, then it wasn't unmounted 
properly.  In the case of a journaled filesystem, typically the journal 
will be replayed and the filesystem will then be mounted.


The last case, dirty with errors indicates that the kernel found invalid 
data while the filesystem was mounted, and recorded that fact in the 
filesystem metadata.  This will normally be the only condition that will 
force an fsck on boot.  It will also normally result in logs being 
generated when the errors are encountered.  If your filesystems are 
force-checked on boot, then the logs should usually tell you why.  It's 
not a matter of a timeout or some device not flushing its cache.


Of course, the other possibility is simply that you've formatted your 
own filesystems, and they have a maximum mount count or a check 
interval.  Use 'tune2fs -l' to check those two values.  If either of 
them are set, then there is no problem with your system.  It is behaving 
as designed, and forcing a periodic check because that is the default 
behavior.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Keith Keller
On 2015-01-07, Gordon Messmer gordon.mess...@gmail.com wrote:

 Of course, the other possibility is simply that you've formatted your 
 own filesystems, and they have a maximum mount count or a check 
 interval.

If Les is having to run fsck manually, as he wrote in his OP, then this
is unlikely to be the cause of the issues he described in that post.
There must be some sort of errors on the filesystem that caused the
unattended fsck to exit nonzero.

--keith


-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] reboot - is there a timeout on filesystem flush?

2015-01-06 Thread Les Mikesell
I've had a few systems with a lot of RAM and very busy filesystems
come up with filesystem errors that took a manual 'fsck -y' after what
should have been a clean reboot.  This is particularly annoying on
remote systems where I have to talk someone else through the recovery.

Is there some time limit on the cache write with a 'reboot' (no
options) command or is ext4 that fragile?

-- 
   Les Mikesell
 lesmikes...@gmail.com
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos