I just noticed a peculiar problem, which has happened at least twice in the last 10 weeks---a file that -was- in the filesystem apparently vanished, then reappeared. I'm looking for advice on how to debug this or research it. [Also, see PS for a slightly different problem that might or might not be related.]
I'm using the JFS that comes out of the box with Ubuntu Breezy, which is kernel 2.6.12-10-386. The filesystem is in a normal partition, NOT mounted over LVM, and it's not the root filesystem. It's exported rw via NFS, but both times a file vanished & reappeared, nothing but the local host was accessing that filesystem (and that had been true for many minutes or hours; only one other host can touch that filesystem anyway). I noticed the problem because I have some automation that's actually a large tcsh script. It always runs with echo & verbose, and logs all its output to a file. It tried to remove a file (call it FOO) simply via "rm /blah/biff/FOO" and got rm: cannot remove `/blah/biff/FOO': No such file or directory I noticed the problem several days later, but FOO is actually there. To ensure I wasn't misreading something, I marked the entire pathname (the actual rm used a rooted pathname, e.g., "rm /blah/biff/FOO") in Emacs and, in a shell buffer, did "ls -alF /blah/biff/FOO" by yanking the marked pathname. It shows the file. "head -c 5 /blah/biff/FOO" gives me the first five bytes. Thus, I'm 100% sure that the pathname the script tried to use and failed with, and the pathname I'm now using, are the same pathname. So I started reviewing my logs of that script and found another instance of the same problem from late November, on a different file. That file has since been deleted in some other way (I probably noticed it hanging around and nuked it without realizing why it was still there). The most recent file, OTOH, is still there. (It's large and I need the space, so I may delete it soon anyway, in the hopes that I'm not about to destroy debugging data.) These files get created by a script running on the machine with the JFS filesystem that pulls them via NFS from another machine (also running JFS) via cp. Since it's a pull, I'm reasonably sure that NFS can't be responsible, and since it's cp, you'd think there'd be nothing unusual going on, and certainly the files don't seem to have any problems getting -used-: it's just that twice in the last 1000 files or so, JFS has claimed a file wasn't there at the instant that rm tried to delete it. I have -not- tried running "fsck.jfs -n -v" yet, nor am I sure that's even the thing to try. I may try that in a few hours, when I have a window during which I can unmount it for a few minutes. Given that I haven't noticed any -other- problems with this FS, I'm loathe to try a non -n version of fsck. (Although note that this machine gets booted every few days on average anyway, so the normal fsck on boot has been run fairly frequently.) I can't swear that a reboot didn't happen between the file's creation, use, and subsequent deletion attempt, but I could research that if it's important. The machine has certainly been rebooted since the failed deletion (about a week ago) and today, when I noticed that the file was there when it shouldn't have been, checked my logs, and found the rm failure. Grepping /var/log/* for "-i jfs" yields pretty much only JFS: nTxBlock = 8098, nTxLock = 64790 in kern.log from when the machine comes up. My /etc/fstab does have a nonzero "pass" for this filesystem, so I'm reasonably sure fsck is running. Has anyone seen anything like this? P.S. This machine occasionally has to have its reset button pushed due to issues with a PCI card that hangs. (The machine itself very rarely hangs even when that happens, but sometimes an attempt to reboot it causes the shutdown to wedge partway through, and -that- requires hitting the reset button. It runs headless, so I can't easily tell exactly where the shutdown hangs.) Some small fraction of the time that -that- happens, I notice that a large file in its JFS that -had- been around for many minutes (long enough to get written to a DVD -and- to have cmp compare the entire file to what got written to the DVD) has vanished when the machine came back up. I find this quite surprising and worrisome---surely syncs are happening every few seconds, or at the very least every few minutes, and yet the file is simply -gone- after the reset. I've now taken to having my automation actually call sync when it's done, so a subsequent reset (which never happens in the middle of the script---I can and do always wait until the script finishes before either attempting to reboot or hitting the reset button) is guaranteed to happen after a sync. I'm now keeping an eye on this to see if this ever recurs---but do I misunderstand how often JFS actually manages to commit its metadata? [I can't swear that the vanishing file was always after a reset instead of a controlled shutdown/reboot, but I'm guessing that it was.] ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Jfs-discussion mailing list Jfs-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/jfs-discussion