> How is it possible that i am using ext3 in my production systems and face > stuff like: > 1. Corrupted FS during normal work that needs to be fixed with fsck or worse > restore from a backup
If you are getting corrupted filesystems during normal operations and with normal shutdowns (ie, you don't just force the Linux guests or let them die horribly with CP SHUTDOWN), then something in your configuration is actively clobbering data behind Linux' back. Ext3 is a journaling filesystem and will survive most user abuse with reasonable care. What other systems have access to those disks? If z/OS, do they have valid VOL1 and VTOCs so z/OS thinks they're full and keeps it's grubby mitts off them? > 2. Resizing a FS requires me to fsck before I resize (as > if the FS does not trust itself to be valid forcing me to umount the FS > before a > resize) An unmounted filesystem is not guaranteed to be error free or consistent. The fsck requirement is a safety measure to ensure the filesystem is error free and consistent on disk before you do major surgery -- and resizes ARE major surgery. If you don't want to do the fsck, you can explicitly disable it, and it's your gun, your foot. >3. Resizing a FS offline actually corrupts the FS See #1. Something is happening outside Linux's control. > 4. The fstab > parameters, that states that it is normal to fsck your FS every boot or every > several mounts... This is another "err on the side of safety" item, and is probably from when Linux made the transition to reliable hardware. Most systems don't have the hw reliability guarantees that System z has, so you do this to adapt to unreliable hardware on smaller systems. You are free to change this to adapt to your risk tolerance, and most Unix (of any sort) with decent iron do change these parameters. Having a "cautious" default seems right to me -- especially since it's telling you about a serious problem.. > 5. FS is busy although it is not mounted or in use by anyone... An example would help here. You can also try lsof, which (IMHO) tends to be more reliable in finding who has what open. > 6. fuser command will not always show the using processes Lsof again. > 7. open files can > be removed without any warning from the rm command. Working as designed. Remember (not like z/OS), Unix is " if you have permission to do it, you asked for it, you got it. Your gun, your foot. Locking is an application issue.". If you want the safety, check out configuration of SELinux. Almost as ugly as RACF, but you can stop root from doing stupid stuff. > 8. removing files from the FS will not free up space in the FS All Unix filesystems reserve some space in a filesystem as a soft quota to halt runaway processes to let admins fix stuff before everything comes casters up. Is the filesystem showing 100% or more than 100% usage in du? If so, then you've probably hit that, and the number won't really change until you drop below that threshold AND all open files are closed. The filesystem won't release space until nobody is using it. > I am a z/OS system programmer and maybe i am expecting for too much, but > even windows don't have this kind of stuff anymore... No, they have other variations of other problems. > I am using redhat V5.2 (not too old) and recently was asked from my local > redhat representative to upgrade my kernel to V5.6 (2.6.18-238). > To my huge surprise i am still seeing this kind of issues even with the new > kernel... Sounds like someone is easter-egging the problem. Kernel upgrades are unlikely to fix this kind of problem. Something else is corrupting your data outside Linux' control. The Linux kernel can't compensate for what it doesn't know about. > Am i alone here? how can this be? Why are we all using linux if it is still > not > ready for production? Production Linux (or any Unix) requires different techniques to manage it. It's learning a lot from z/OS (finally), but this is pretty much the state of the world for *any* Unix implementation at scale. Your data corruption problems are something else, though -- something unique to your configuration. > will ext4 fix that or is it just bigger, faster but based on the same unstable > technology? Ext4 isn't going to fix your problem (in fact, it has a whole new set of issues). ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
