On Thu, 2005-12-29 at 16:30 -0800, Bob Miller wrote:
> Whose turn is it to advocate Python today?
>
I'll give it a shot (warning, python hobbist and non-expert at large)
---- get_hosts.py - cut here ----
#!/usr/bin/python
import re
import os
import sys
import socket
if sys.argv[1:] == []:
print "USAGE: ./get_hosts.py incoming_path results_file "
print "incoming_path is the directory containing your web logs."
print "results_file can go anywhere, just don't use the same ",
print "directory as the log files or you'll screw up your ",
print "counts."
sys.exit(2)
else:
incoming_path = sys.argv[1:][0]
results_file = sys.argv[1:][1]
# regex to match something resembling a dotted numerical IP address
searchexp = re.compile('(\d+\.){3,}\d+')
ips = { }
# look at each file in the directory
# this could be modified to use os.walk() if we wanted to
# traverse a directory tree
for file in os.listdir(incoming_path):
print "reading " + file
try:
f = open(os.path.join(incoming_path, file), "r")
for line in f:
ipaddy = searchexp.match(line)
if ipaddy:
# if we found an IP address,
# see if it's already in our
# list, and if it is,
# increment the count
if ips.has_key(ipaddy.group()):
ips[ipaddy.group()] += 1
else:
ips[ipaddy.group()] = 1
f.close()
# directories raise an IOError - just pass them by
except IOError, (errno, strerror):
print "I/O error(%s): %s" % (errno, strerror)
# the results file can go anywhere, just don't use the same directory
# as the log files or you'll screw up your counts
print "writing: " + results_file
print "(this may take time to resolve some addresses)"
f = open(results_file, "w")
for ip, count in ips.iteritems():
# I only care if they hit at least ten times - too many 1's
if count > 10:
try:
hostname = socket.gethostbyaddr(ip)[0]
except:
hostname = "unknown host"
f.write(count.__str__() + "\t" + ip )
f.write("\t(" + hostname + ")\n")
print ".",
f.close()
print "Done!"
---- end of get_hosts.py - cut here ----
It took several minutes for my box to backwards-resolve each ip address
to a host, that's why I added the dots to show the progress (so I could
see if anything was happening at all).
- Jason
--
Jason LaPier
Network Manager
TACS/WRRC
University of Oregon
_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug