Hi Peter, It's been a while since we last exchanged emails. I'm getting some wierdness from my multi-threaded java program. I'm running 10 threads that updates 24,000+ rrd files. Every once in a while, I can't close the program because one of my threads are hanging on a lock of somekind and i'm not exactly sure why. First off, I'm updating rrd files over a nfs mount and really this started happening after I made the switch to nfs from local disc. NFS is a must, so I need to get this working. I noticed, while I was encountering this problem, that when I did a lsof, I saw the same file listed multiple times. Anywhere from 3 to 8 times during the same period. The way I programmed the java is to assign all the updates from 1 rrd file to 1 thread. That is, each rrd file should not see access from different threads, but only the same thread. Now, just to debug, I limited myself to 1 thread.
Here's a dump of: ` lsof -a -N -u rhalstead -r 1` COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 7873 rhalstead 12u VREG 255,117440514 375576 18446744073490300471 /mnt/nms2/test/hlnmt001acm1/hlnmt001acm1_errors_in_3820.rrd java 7873 rhalstead 12u VREG 255,117440514 375576 18446744073490300471 /mnt/nms2/test/hlnmt001acm1/hlnmt001acm1_errors_in_3820.rrd java 7873 rhalstead 12u VREG 255,117440514 375576 18446744073490300471 /mnt/nms2/test/hlnmt001acm1/hlnmt001acm1_errors_in_3820.rrd COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 8261 rhalstead txt VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd java 8261 rhalstead 4u VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd java 8261 rhalstead txt VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd java 8261 rhalstead 4u VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd java 8261 rhalstead txt VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd java 8261 rhalstead 4u VREG 255,117440514 188744 1497859076 /mnt/nms2/test/codwy001hb6_traffic_in_33680.rrd Now, on the java side of things, I'm not opening the file at all. I do a FILE.ifExists() before any thing to see if I need to call rrd_create_r(), but that's it. Durring the whole process of writing an rrd file, I first see if I need to call rrd_create_r(). Then I do a rrd_last_r() call, then finally rrd_update_r(). My question, why am I showing multiple filehandle's to the same file? I'm not entirely sure how lsof works, if it take a window snapshot or not, maybe you know more about that, but should rrdtool really only have one filehand open at a time? I coded the java program to hang if it can't close any of it's own threads and this is exactly what happens. When I encounter this, I do a lsof and see that I still have multiple listings for the same file in lsof, but I can't really tell if it's waiting for another file lock, or locking on something else. I'm going to run my program using 1 thread and see if it hangs again. The hanging is entirely random, which is making it hard to debug the cause. Could you give me any insight on why i'm seeing multiple's in lsof? Or if rrdtool is opening multiple filehandles to the same file, why? I am wondering about the file lock's as well and how that is handled. Thanks for your help Peter, this issue has been kicking my @ss all week! -- "A fool acts, regardless; knowing well that he is wrong. The ignoramus acts on only what he knows, but all that he knows. The ignoramus may be saved, but the fool knows that he is doomed." Robert Halstead -- Unsubscribe mailto:[EMAIL PROTECTED] Help mailto:[EMAIL PROTECTED] Archive http://lists.ee.ethz.ch/rrd-developers WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
