Dennis Peterson wrote:
>Don't scan every file every day - that makes no sense. Just scan files that 
>have changed since the previous scan (google tripwire and similar tools).

And I replied:
>I'll have to think about this, as it's becoming a lot more complicated than I 
>had expected.


After thinking about it, I still have misgivings about not scanning every file 
every day: a file may not change day-to-day, but new virus signatures are added 
all the time, and yesterday's file may contain today's newly recognized virus. 
But, given the time needed to do a full scan, I have had to adopt a policy of 
scanning only new or changed files.

I looked again at Tripwire and its ilk (e.g., Aide): they are very complicated 
and large, and not really designed for this purpose. Thus, in true Open Source 
tradition, I have written a surprisingly small Bash script to do the necessary 
work to determine which files need to be scanned, based on timestamp, size, 
inode, and hash. This script only needs the utilities that normally come with 
most Linux (and some *BSD and commercial Unix) systems.


The operation of the script is as follows.

1. Traverse a directory tree (using 'find'), and for each regular file, compute 
its hash and tabulate the hash value along the file's inode, sizes, timestamp 
and name. Each line of the output file looks as follows (where I=inode, 
T=timestamp, N=size, B=block count, H=hash value):

I:720897 T:799391846 N:1076 B:8 H:6df9f5744e96466c27477819978f07c5dbae671e 
/samba/Samba/MSDEV/SAMPLES/SDK/WINNT/REGMPAD/MAKEFILE

2. Compare, using 'diff', the new tabulation file (which has been sorted) with 
the previous tabulation file for the same directory. This gives a file listing 
the new and changed files in the directory tree (but not the deleted files).

3. Build a temporary directory tree containing links to the set of files to be 
scanned. The links are symbolic if the installed version of clamscan supports 
them (see below), otherwise they are hard links (with their attendant 
limitations). The temporary directory tree has exactly two levels, and is done 
that way merely to limit the sizes of the temporary directories.

4. Pass the entire temporary directory tree to clamscan to be scanned 
recursively. (When a lot of files are to be scanned, this is probably the most 
efficient approach, since there is a rather small limit on the length of a 
command line, and using clamd involves IPC.)

5. Take the output of clamscan and transform the names of any files that it 
flagged back to the original file names. This is trivial for symbolic links, 
but can be time-consuming for hard links (all the generated link-names have to 
be looked up in the diff file).

6. Clean up by removing the temp directories etc.; rename the transformed 
clamscan output by appending a timestamp to its name (so you can keep a 
history).


To go along with this script, I made a modified version of the clamscan program 
which, when given a command line option, will follow symbolic links to files 
(symlinks to directories are not needed by this script).

I have been using these for a while and am relatively satisified.

Notes:

1. The script creates files and directories (e.g., clamscan results) whose 
names must be easy to read and parse and yet correspond to paths. It does this 
by unambiguously removing slashes and spaces: "/" -> "%.", " " -> "%_" and "%" 
-> "%%" (rather than by using the ugly URL transform). For example, the path 
"/Program Files/Killer%App/" would be converted to 
"%.Program%_Files%.Killer%%App%.".

 2. Even if no files have changed, all the files must have their hashes 
recomputed, which takes a noticeable amount of time. I use the SHA1 hash, as it 
is cryptographically stronger than the faster MD5 (which means that even a 
clever virus would find it almost impossible to hide itself inside a previously 
legitimate file).


The script is included below: you should read it and adjust it to your 
situation. 

The patches to clamav (0.88.4) follow the script.

===============================================================================

#!/bin/bash

# This script, when given one or more directories, constructs current lists
#   of timestamps, sizes, inodes and hashes of each file and compares them
#   with the previous lists to determine which files have changed and thus
#   need to be scanned. It then constructs directories of links to those files
#   and clamscans those directories. (Hardlinks are used if the available
#   version clamscan doesn't follow symbolic links.)
#
# Usage is: $0  working-directory  directory-1 ...


# Copyright (C) 2006 Paul Kosinski <pk[at]iment[dot]com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.



# File characteristics to be output by find -- prefixed to output from file hash
    PF='I:%-10i  T:%-11T@  N:%-11s  B:%-8b  H:'

# Regex for SED -- extract inode and filename from file-hash list
    RE='^I:\([0-9]\+\)\s\+\([TNBH]:[0-9a-f]\+\s\+\)\+'

# Max number of links in each subdir (keeps directories reasonable size)
  MAXF=1000

# Hash funtion to be used (MD5 is faster than SHA1, but cryptographically 
weaker)
  HASH='/usr/bin/sha1sum'

# Where clamscan is
  CLAM='/opt/clamav/bin/clamscan'

# Can clamscan follow symlinks (for files): 0 - no; 1 - yes
  FFSL=1

# Where each utility program is
    LN='/bin/ln'
    LS='/bin/ls'
    MV='/bin/mv'
    RM='/bin/rm'
    TR='/usr/bin/tr'
   CAT='/bin/cat'
   SED='/usr/bin/sed'
  DIFF='/usr/bin/diff'
  ECHO='/bin/echo'
  EXPR='/usr/bin/expr'
  FIND='/usr/bin/find'
  GREP='/usr/bin/grep'
  NICE='/usr/bin/nice'
  SORT='/usr/bin/sort'
  STAT='/usr/bin/stat'
  TAIL='/usr/bin/tail'
 MKDIR='/bin/mkdir'




# Subr to rename a file according to its timestamp: "foo.bar" -> 
"foo.bar.060101-123456"
  function rename_by_timestamp ()
  {

    if [ -f "$1" ] ; then

      TS=`$STAT -c '%y' "$1" | $SED "s/\..\+$//; s/^[0-9][0-9]//; s/[-:]//g; 
s/[0-9][0-9]$//" | $TR ' ' '-'`

      $MV "$1" "$1.$TS"

    fi

  }



# Subr to compute the new list of file characteristics: inode, timestamp, 
bytes, blocks and hash
  function hash4scan ()
  {

    echo "*** hash4scan $1"

#   Convert path to unambiguously remove slashes and spaces ('/' -> '%.', ' ' 
-> '%_' and '%' -> '%%')
    F=`$ECHO "$1" | $SED "s/\([/ %]\)/%\1/g" | $TR '/ %' '._%'`

    $MV  $WD/$F.new  $WD/$F.old

    $NICE $FIND "$1" -type f -printf "$PF"  -exec $HASH \{\} \;  |  $NICE $SORT 
 >  $WD/$F.new

  }


# Subr to compute list of files which have changed since last time and thus 
need to be scanned
  function diff4scan ()
  {

    echo "*** diff4scan $1"

#   Convert path to unambiguously remove slashes and spaces ('/' -> '%.', ' ' 
-> '%_' and '%' -> '%%')
    F=`$ECHO "$1" | $SED "s/\([/ %]\)/%\1/g" | $TR '/ %' '._%'`

    $NICE $DIFF -BbN -e  $WD/$F.old  $WD/$F.new   |   $NICE $GREP '^I:'   |   
$NICE $SED -e "s/$RE/\1  /"   >   $WD/$F.diff

  }

# Create directory and first level subdirectories that contain links to files 
to be scanned
  function link4scan ()
  {

    echo "*** link4scan $1"

#   Convert path to unambiguously remove slashes and spaces ('/' -> '%.', ' ' 
-> '%_' and '%' -> '%%')
    F=`$ECHO "$1" | $SED "s/\([/ %]\)/%\1/g" | $TR '/ %' '._%'`

    $MKDIR "$WD/$F"

    K='0'
    J='0'

    cd $WD/$F


    $CAT "$WD/$F.diff" | \
    while read I G ; do

      if [ $J -eq 0 ] ; then

        K=`$EXPR $K + 1`

        $MKDIR "$WD/$F/$K"

        cd "$WD/$F/$K"

      fi

      J=`$EXPR $J + 1`


      if [ $FFSL -gt 0 ] ; then

        $LN -sf  "$G" "$J"

      else

        $LN -f  "$G" "$I"

      fi


      if [ $J -ge $MAXF ] ; then

        J=0

      fi

    done

  }


  function clamscan ()
  {

    echo "*** clam-scan $1"

#   Convert path to unambiguously remove slashes and spaces ('/' -> '%.', ' ' 
-> '%_' and '%' -> '%%')
    F=`$ECHO "$1" | $SED "s/\([/ %]\)/%\1/g" | $TR '/ %' '._%'`

    if [ $FFSL -gt 0 ] ; then

      $NICE $CLAM -ri --follow-file-symlinks "$WD/$F"   >   "$WD/$F.clam"

    else

      $NICE $CLAM -ri "$WD/$F"                          >   "$WD/$F.clam"

    fi

  }


  function listvirs ()
  {

    echo "*** list-virs $1"

#   Convert path to unambiguously remove slashes and spaces ('/' -> '%.', ' ' 
-> '%_' and '%' -> '%%')
    F=`$ECHO "$1" | $SED "s/\([/ %]\)/%\1/g" | $TR '/ %' '._%'`


    if [ $FFSL -gt 0 ] ; then

      $CAT "$WD/$F.clam"  |  $GREP 'FOUND$'  | \
      while read VF T ; do

        VF=`echo "$VF"  |  $SED -e 's/:$//'`

        VF=`$LS -l "$VF" | $SED -e "s/^.* -> //"`

        echo "$VF  $T"  >>  "$WD/$F.scan"

      done

    else

      $CAT "$WD/$F.clam"  |  $GREP 'FOUND$'  |  $TR '/' ' '  | \
      while read A B C D I T ; do

        I=`echo "$I"  |  $TR -c '0-9' ' '`

        $GREP "^$I" "$WD/$F.diff"  | \
        while read I VF ; do

          echo "$VF  $T"  >>  "$WD/$F.scan"

        done

      done

    fi


    $TAIL -9 "$WD/$F.clam"  >>  "$WD/$F.scan"

    $RM "$WD/$F.clam"


    $RM -rf "$WD/$F"

    rename_by_timestamp "$WD/$F.diff"
    rename_by_timestamp "$WD/$F.scan"

  }



# main program


  if [ $# -lt 2 ] ; then

    echo "Usage is: $0  working-directory  directory-1 ..."

    exit 1

  fi


# Construct absolute-path working directory to contain file-hash lists,
#   diff output (i.e. files to be scanned), clamscan output etc.

   X=`$ECHO "$PWD" | $SED 's|/|\\\\/|g'`
  WD=`$ECHO "$1" | $SED "s/\/$//" | $SED "s/^\([^/]\)/"$X"\/\1/"`

  shift


# Iterate over directories to be scanned

  for D in "$@" ; do

    hash4scan "$D"

    diff4scan "$D"

    link4scan "$D"

    clamscan  "$D"

    listvirs  "$D"

  done

===============================================================================

diff -c /src/clamav/clamav-0.88.4/libclamav/clamav.h 
/src/clamav/clamav-0.88.4/libclamav/clamav.h.orig
*** /src/clamav/clamav-0.88.4/libclamav/clamav.h        Thu Sep 28 17:46:34 2006
--- /src/clamav/clamav-0.88.4/libclamav/clamav.h.orig   Tue Dec 20 14:44:34 2005
***************
*** 76,89 ****
  #define CL_SCAN_MAILURL               256
  #define CL_SCAN_BLOCKMAX      512
  
- 
- /* PRK Thu 28 Sep 2006 begin */
- 
- #define CL_SCAN_FILESYMLINKS  0x10000000
- 
- /* PRK Thu 28 Sep 2006 end   */
- 
- 
  /* recommended options */
  #define CL_SCAN_STDOPT                (CL_SCAN_ARCHIVE | CL_SCAN_MAIL | 
CL_SCAN_OLE2 | CL_SCAN_HTML | CL_SCAN_PE) 
  
--- 76,81 ----

===============================================================================

diff -c /src/clamav/clamav-0.88.4/clamscan/clamscan.c 
/src/clamav/clamav-0.88.4/clamscan/clamscan.c.orig
*** /src/clamav/clamav-0.88.4/clamscan/clamscan.c       Thu Sep 28 18:00:48 2006
--- /src/clamav/clamav-0.88.4/clamscan/clamscan.c.orig  Mon Jan  9 12:46:05 2006
***************
*** 230,240 ****
      mprintf("                                         all .cvd and .db[2] 
files from DIR\n");
      mprintf("    --log=FILE            -l FILE        Save scan report to 
FILE\n");
      mprintf("    --recursive           -r             Scan subdirectories 
recursively\n");
- 
- /* PRK Thu 28 Sep 2006 begin */
-     mprintf("    --follow-file-symlinks               Follow symlinks to 
files (only)\n");
- /* PRK Thu 28 Sep 2006 end   */
- 
      mprintf("    --remove                             Remove infected files. 
Be careful!\n");
      mprintf("    --move=DIRECTORY                     Move infected files 
into DIRECTORY\n");
  #ifdef HAVE_REGEX_H
--- 230,235 ----

===============================================================================

diff -c /src/clamav/clamav-0.88.4/clamscan/manager.c 
/src/clamav/clamav-0.88.4/clamscan/manager.c.orig
*** /src/clamav/clamav-0.88.4/clamscan/manager.c        Thu Sep 28 17:46:34 2006
--- /src/clamav/clamav-0.88.4/clamscan/manager.c.orig   Mon Jan  9 12:46:23 2006
***************
*** 161,179 ****
  
      /* set options */
  
- 
- 
- /* PRK Thu 28 Sep 2006 begin */
- 
-     if(optl(opt, "follow-file-symlinks"))
-       options |= CL_SCAN_FILESYMLINKS;
-     else
-       options &= ~CL_SCAN_FILESYMLINKS;
- 
- /* PRK Thu 28 Sep 2006 end   */
- 
- 
- 
      if(optl(opt, "disable-archive") || optl(opt, "no-archive"))
        options &= ~CL_SCAN_ARCHIVE;
      else
--- 161,166 ----

===============================================================================

diff -c /src/clamav/clamav-0.88.4/clamscan/options.c 
/src/clamav/clamav-0.88.4/clamscan/options.c.orig
*** /src/clamav/clamav-0.88.4/clamscan/options.c        Thu Sep 28 18:01:19 2006
--- /src/clamav/clamav-0.88.4/clamscan/options.c.orig   Thu Jun 23 16:03:09 2005
***************
*** 114,124 ****
            {"tar", 2, 0, 0},
            {"tgz", 2, 0, 0},
            {"deb", 2, 0, 0},
- 
- /* PRK Thu 28 Sep 2006 begin */
-           {"follow-file-symlinks", 0, 0, 0},
- /* PRK Thu 28 Sep 2006 end   */
- 
            {0, 0, 0, 0}
        };
  
--- 114,119 ----

===============================================================================

diff -c /src/clamav/clamav-0.88.4/clamscan/treewalk.c 
/src/clamav/clamav-0.88.4/clamscan/treewalk.c.orig
*** /src/clamav/clamav-0.88.4/clamscan/treewalk.c       Thu Sep 28 18:26:06 2006
--- /src/clamav/clamav-0.88.4/clamscan/treewalk.c.orig  Thu Dec 22 20:16:56 2005
***************
*** 40,63 ****
  #include "memory.h"
  #include "output.h"
  
- 
- int checksymlink(const char *path)
- {
-       struct stat statbuf;
- 
-     if(stat(path, &statbuf) == -1)
-       return -1;
- 
-     if(S_ISDIR(statbuf.st_mode))
-       return 1;
- 
-     if(S_ISREG(statbuf.st_mode))
-       return 2;
- 
-     return 0;
- }
- 
- 
  int treewalk(const char *dirname, struct cl_node *root, const struct passwd 
*user, const struct optstruct *opt, const struct cl_limits *limits, int 
options, unsigned int depth)
  {
        DIR *dd;
--- 40,45 ----
***************
*** 128,138 ****
                            if(treewalk(fname, root, user, opt, limits, 
options, depth) == 1)
                                scanret++;
                        } else {
! 
! /* PRK Thu 28 Sep 2006 begin */
!                           if(S_ISREG(statbuf.st_mode) || ((options & 
CL_SCAN_FILESYMLINKS) && S_ISLNK(statbuf.st_mode) && (checksymlink(fname) == 
2)))
! /* PRK Thu 28 Sep 2006 end   */
! 
                                scanret += scanfile(fname, root, user, opt, 
limits, options);
                        }
                    }
--- 110,116 ----
                            if(treewalk(fname, root, user, opt, limits, 
options, depth) == 1)
                                scanret++;
                        } else {
!                           if(S_ISREG(statbuf.st_mode))
                                scanret += scanfile(fname, root, user, opt, 
limits, options);
                        }
                    }

_______________________________________________
http://lurker.clamav.net/list/clamav-users.html

Reply via email to