hi Brian, let me provide my comments below in inline fashion:
> > I've seen a log rotation where the input file did not get re-opened, and am > working on troubleshooting that. > > For the SEC process that failed, sending a SIGUSR2 failed, but sending a > SIGABRT worked. > (both sent as the same user as the process owner) To clarify the purpose of SIGUSR2 a bit -- this signal has been indeed designed for handling file rotations, but it only works for SEC outputs (for example, files created with 'write' action) and SEC log file (specified with --log command line option). As for handling input file rotations, it happens automatically and there is no need for sending a specific signal to SEC. For each input file, its inode number is monitored and whenever it changes, the input file has been rotated and will be reopened. As for the SIGABRT signal, it forces SEC to reopen all input files if --nokeepopen command line option has been provided, but by default SEC only attempts to open those input files which are currently in the closed state. > > The input file for that process is an NFS mounted read-only backed file > system, to which I have no real access for experimentation. > > I created a baby SEC config file for testing, and specified a local > filesystem, and was unable to recreate the failure. > I tried using "mv input_orig input_new; touch input_new", and "cp /dev/null > >> input_orig". > I used non-detached mode, as opposed to detached mode for the failing config, > though I wouldn't expect that to make a difference. > > I'm currently thinking that it might have to do with the NFS mount options, > perhaps specifically the locking methods, or maybe the soft vs. hard mount. Since handling log rotations depends on file inode numbers, I suspect the file inode number occasionally stays the same after rotation on an NFS mounted file system. To find out what actually happened, it is best to look into the SEC log file. Whenever a rotated input file has been detected, the message "Input file <file> has been recreated" will be written into the log file before input file is reopened (if the input file is truncated without inode number change, the message "Input file <file> has been truncated" is logged). If you are currently not collecting SEC log messages, I'd recommend activating logging with the --log command line option and in order to keep the log file smaller, you can exclude debug-level messages with --debug=5 command line option. Once the issue surfaces again, log messages will provide a clue what actually happened. For finding out what information SEC currently has about input files, you can let SEC generate a dump file with SIGUSR1 signal and then look into the "Input sources" section in the dump file. Here is an example fragment from this section: /var/log/sshd.log (status: Open, type: regular file, read offset: 256, file size: 256, device/inode: 2065/6449643528, received data: 8123 lines, context: _FILE_EVENT_SSHD) >From the above information, you can see the device and inode numbers of the input file. With /usr/bin/stat tool you can find out what device and inode numbers are reported for that file by the NFS server and if these numbers are the same you can see in the SEC dump file (these numbers should always be the same, apart from a very small time frame before SEC handles the rotated file). > > The mtab entry (redhat 7.9) for this includes the following options: > > foo.ucsd.edu:/remotefilesystem /localfilemountdir nfs > ro,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,soft,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=AAA.BBB.CCC.DDD,mountvers=3,mountport=4002,mountproto=udp,local_lock=all,addr=AAA.BBB.CCC.DDD > 0 0 > > > The same SEC config running on a Solaris 11 box with the same NFS mounted > filesystem, doesn't have this problem. The mnttab file there has these > options: > > foo.ucsd.edu:/remotefilesystem /localfilenountdir nfs > ro,nodevices,noquota,vers=3,proto=tcp,xattr,zone=ratbert2,sharezone=1,dev=9540001 > 1613782895 > > Ideas anyone? >From different forum posts I found a discussion on 'actimeo' file system option which can be used for setting the caching time for file attributes. If the current value is too large, the NFS client might cache file attributes for too long and not detect the inode number change in a timely fashion. But it is just a guess and for investigating this issue more closely, some experiments with a test NFS server are needed. Hope this helps, risto > > -- > Brian Parent > Information Technology Services Department > ITS Computing Infrastructure Operations Group > its-ci-ops-h...@ucsd.edu (team email address for Service Now) > UC San Diego > (858) 534-6090 > > > _______________________________________________ > Simple-evcorr-users mailing list > Simple-evcorr-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users