Hi there,
On Sun, 26 Aug 2012, Jonathan Ryshpan wrote:
I am a complete clamav newbie trying to scan a large filesystem. I'm
running Fedora-17 Linux. The current invocation (after several
modifications) has this form:
clamscan -r -i --exclude-dir=^/media/ \
--exclude-dir=^/proc/ \
--exclude-dir=^/sys/ \
--exclude-dir=^/dev/ \
/ 2>&1 | tee clamscan.log
Why make it so hard for yourself? You could give a list of the
directories that you want to scan, instead of a list of those that you
don't. I'd probably do something like this
clamscan -r -v -f file.list -l scan.log
or
clamscan -r -v -l ~/scan_jojos_80G.log \
/mnt/sde1/ \
> ~/scan_jojos_80G.stdout \
2> ~/scan_jojos_80G.stderr &
Then I'd probably look through the output using 'grep' quite a lot.
Actually I'm running that last scan as I write, and so far it's found
four phishing scams in the users' mailboxes, but nothing that might
explain why the machine keeps rebooting itself. To your questons...
Read the man page for more information, but be prepared to experiment
as the man page does leave a little wiggle room, particularly regarding
the anchoring of the pattern and how it behaves if it's a partial match.
Rather than clamscan I might use clamdscan and possibly 'find' (which
has a very well defined behaviour) but you don't really need to worry
about that right now.
1. Is this a correct invocation to scan the filesystem, excluding
the system filesystems /proc, /sys, and /dev, also
excluding /media?
No. Your patterns will probably match nothing. The caret ('^') is
not required unless you have a directory called '^' which would be
willful to say the least. The pattern is *not* a regular expression
(or 'regex'), it is only a pattern which will be matched against paths
found by scanning the directory structure. I wouldn't bother with
'tee', and I wouldn't (normally) want my stderr and stdout all mixed
up together.
2. Is the "^" following the "=" in the "--exclude-dir" option
required or optional or forbidden? I would think that since the
argument is a REGEX a "^" would be required to get the desired
result, which is to exclude everything under these top level
directories but not other directories at lower levels (say
something like /home/phred/proj/dev/...). The examples mostly
don't have a "^", though some do.
The man page is clear that PATT is a pattern. It is not terribly
clear on what sort of a pattern, so experiment. It definitely does
not say that it's a regular expression. You will need to be careful
if you use shell metacharacters on the command line, because whatever
shell you happen to be using will do things with the characters that
you hand to it before the program that you're invoking with it. For
example if you're using the 'bash' shell and you do something like
clamscan -r -v *
then the * is not a 'pattern' as described in the clamscan man page.
It's a shell metacharacter which will, *before* the command line
arguments are handed to the program (clamscan in this case) being
invoked, be expanded by the shell into something else (which depends
on the current working directory). Clamscan will never see the '*'
character in this example.
IMO the things you read on forums are generally suspect unless you
know that the author knows what he's talking about. If you're new to
this then the odds are you won't know that. Any examples found in the
ClamAV Website documentation should be reliable. If not, do point out
any problems that you find.
3. Similarly is the "/" following the directory name required?
Some postings imply that it is, but if the argument is a REGEX,
it ought not to be.
First you say the argument's a regex, then you say 'if' it's a regex... :)
It's a pattern. There's a big difference. The trailing slash is optional
and attempting to scan the directories Temp1, Temp2, Temp3 using for example
clamscan -r Temp
will fail:
laptop:~$ >>> ls -l Temp*
Temp1:
total 0
Temp2:
total 0
Temp3:
total 0
laptop:~$ >>> clamscan Temp
...
WARNING: Can't access file Temp
Temp: No such file or directory
...
Scanned directories: 0
Scanned files: 0
4. The filesystems /proc, /sys, and /dev are traps for the unwary.
Yep. :) Don't mess around in there if you don't know what you're doing.
If not specifically excluded they are scanned, which is
pointless, takes a long time, and produces lots of errors.
And possibly worse.
If a warning to exclude them isn't in a prominent place in
the documentation, it should be.
In the world in which you now find yourself, you're expected to know
what you're doing. If you don't know what you're doing then don't do
it, and especially don't do it if you're logged in as root. :)
The documentation does not assume that you even have those directories
in your system and it also does not assume that you're running the binary
on your SmartPhone, where presumably there are other traps for the unwary.
It tells you how to scan what you want to scan. Whether it's a good idea
to do that or not is your decision. In all fairness, it is not possible
within the confines of a 'man' page to cater for people who are still a
few years of study away from being a Unix system administrator and if you
are scanning *any* system for malicious software you'd better have a fair
idea what you're going to do if you find any before you start, or you're
possibly going to do more damage to a Unix system than anything that the
malicious software might do.
(I haven't read the documentation carefully enough to be sure
that it isn't in it somewhere.)
Oooooohhh, fancy admitting that in your first post to a mailing list!
In the world in which you now find yourself, you're expected to read
everything carefully before pestering people who have better things to
do than hold your hand because you can't be bothered to put in the time
to read the documentation carefully. I'll forgive you now, largely
because of the failings in the documentation you understand. :)
One might ask why you're scanning this filesystem. ClamAV is more
than anything for scanning data that will be used on Windows boxes.
There are normally only three occasions when I would use clamscan or
clamdscan on one of my Linux boxes. Listed by decreasing probability:
1. when when I've mounted on it an NTFS partition from a disc which
has recently been removed from a compromised Windows box for scanning,
2. when I'm looking at the SAMBA shares on a file server which is used
by Windows machines, and
3. when I'm bulk scanning mailboxes on a mail server which may be
accessed by Windows machines.
Apart from mail, I don't allow files to be uploaded from unknown
sources to any of my machines. That's a can of worms, and if it's
your situation you have my sympathy.
Look at --max-dir-recursion if you're scanning deep in a filesystem,
and watch out for the risk of consuming excessive resources, as is
explained in the man page.
--
73,
Ged.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml