Re: [Simple-evcorr-users] reopening inputfile inconsistently fails

Risto Vaarandi Wed, 26 May 2021 01:24:02 -0700

hi Brian,

let me provide my comments below in inline fashion:


>
> I've seen a log rotation where the input file did not get re-opened, and am 
> working on troubleshooting that.
>
> For the SEC process that failed, sending a SIGUSR2 failed, but sending a 
> SIGABRT worked.
> (both sent as the same user as the process owner)

To clarify the purpose of SIGUSR2 a bit -- this signal has been indeed
designed for handling file rotations, but it only works for SEC
outputs (for example, files created with 'write' action) and SEC log
file (specified with --log command line option). As for handling input
file rotations, it happens automatically and there is no need for
sending a specific signal to SEC. For each input file, its inode
number is monitored and whenever it changes, the input file has been
rotated and will be reopened. As for the SIGABRT signal, it forces SEC
to reopen all input files if --nokeepopen command line option has been
provided, but by default SEC only attempts to open those input files
which are currently in the closed state.

>
> The input file for that process is an NFS mounted read-only backed file 
> system, to which I have no real access for experimentation.
>
> I created a baby SEC config file for testing, and specified a local 
> filesystem, and was unable to recreate the failure.
> I tried using "mv input_orig input_new; touch input_new", and "cp /dev/null 
> >> input_orig".
> I used non-detached mode, as opposed to detached mode for the failing config, 
> though I wouldn't expect that to make a difference.
>
> I'm currently thinking that it might have to do with the NFS mount options, 
> perhaps specifically the locking methods, or maybe the soft vs. hard mount.

Since handling log rotations depends on file inode numbers, I suspect
the file inode number occasionally stays the same after rotation on an
NFS mounted file system. To find out what actually happened, it is
best to look into the SEC log file. Whenever a rotated input file has
been detected, the message "Input file <file> has been recreated" will
be written into the log file before input file is reopened (if the
input file is truncated without inode number change, the message
"Input file <file> has been truncated" is logged). If you are
currently not collecting SEC log messages, I'd recommend activating
logging with the --log command line option and in order to keep the
log file smaller, you can exclude debug-level messages with --debug=5
command line option. Once the issue surfaces again, log messages will
provide a clue what actually happened.

For finding out what information SEC currently has about input files,
you can let SEC generate a dump file with SIGUSR1 signal and then look
into the "Input sources" section in the dump file. Here is an example
fragment from this section:

/var/log/sshd.log (status: Open, type: regular file, read offset: 256,
file size: 256, device/inode: 2065/6449643528, received data: 8123
lines, context: _FILE_EVENT_SSHD)

>From the above information, you can see the device and inode numbers
of the input file. With /usr/bin/stat tool you can find out what
device and inode numbers are reported for that file by the NFS server
and if these numbers are the same you can see in the SEC dump file
(these numbers should always be the same, apart from a very small time
frame before SEC handles the rotated file).

>
> The mtab entry (redhat 7.9) for this includes the following options:
>
> foo.ucsd.edu:/remotefilesystem /localfilemountdir nfs 
> ro,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,soft,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=AAA.BBB.CCC.DDD,mountvers=3,mountport=4002,mountproto=udp,local_lock=all,addr=AAA.BBB.CCC.DDD
>  0 0
>
>
> The same SEC config running on a Solaris 11 box with the same NFS mounted 
> filesystem, doesn't have this problem.  The mnttab file there has these 
> options:
>
> foo.ucsd.edu:/remotefilesystem /localfilenountdir nfs 
> ro,nodevices,noquota,vers=3,proto=tcp,xattr,zone=ratbert2,sharezone=1,dev=9540001
>        1613782895
>
> Ideas anyone?

>From different forum posts I found a discussion on 'actimeo' file
system option which can be used for setting the caching time for file
attributes. If the current value is too large, the NFS client might
cache file attributes for too long and not detect the inode number
change in a timely fashion. But it is just a guess and for
investigating this issue more closely, some experiments with a test
NFS server are needed.

Hope this helps,
risto

>
> --
> Brian Parent
> Information Technology Services Department
> ITS Computing Infrastructure Operations Group
> its-ci-ops-h...@ucsd.edu (team email address for Service Now)
> UC San Diego
> (858) 534-6090
>
>
> _______________________________________________
> Simple-evcorr-users mailing list
> Simple-evcorr-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users


_______________________________________________
Simple-evcorr-users mailing list
Simple-evcorr-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users

Re: [Simple-evcorr-users] reopening inputfile inconsistently fails

Reply via email to