On Thu, 2005-12-29 at 16:30 -0800, Bob Miller wrote:

> Whose turn is it to advocate Python today?
> 

I'll give it a shot (warning, python hobbist and non-expert at large)

---- get_hosts.py - cut here ----
#!/usr/bin/python

import re
import os
import sys
import socket

if sys.argv[1:] == []:
        print "USAGE: ./get_hosts.py incoming_path results_file "
        print "incoming_path is the directory containing your web logs."
        print "results_file can go anywhere, just don't use the same ",
        print "directory as the log files or you'll screw up your ",
        print "counts."
        sys.exit(2)
else:
        incoming_path = sys.argv[1:][0]
        results_file = sys.argv[1:][1]
        
        
# regex to match something resembling a dotted numerical IP address
searchexp = re.compile('(\d+\.){3,}\d+')
ips = { }

# look at each file in the directory
# this could be modified to use os.walk() if we wanted to 
# traverse a directory tree
for file in os.listdir(incoming_path):
        print "reading " + file
        try: 
                f = open(os.path.join(incoming_path, file), "r")
                for line in f:
                        ipaddy = searchexp.match(line)
                        if ipaddy:
                                # if we found an IP address, 
                                # see if it's already in our
                                # list, and if it is, 
                                # increment the count
                                if ips.has_key(ipaddy.group()):
                                        ips[ipaddy.group()] += 1
                                else:
                                        ips[ipaddy.group()] = 1
                f.close()
        # directories raise an IOError - just pass them by
        except IOError, (errno, strerror):
                print "I/O error(%s): %s" % (errno, strerror)

# the results file can go anywhere, just don't use the same directory
# as the log files or you'll screw up your counts

print "writing: " + results_file
print "(this may take time to resolve some addresses)"

f = open(results_file, "w")

for ip, count in ips.iteritems():
        # I only care if they hit at least ten times - too many 1's
        if count > 10:
                try:
                        hostname = socket.gethostbyaddr(ip)[0]
                except:
                        hostname = "unknown host"
                f.write(count.__str__() + "\t" + ip )
                f.write("\t(" + hostname + ")\n")
                print ".",
                
f.close()
print "Done!"
---- end of get_hosts.py - cut here ----


It took several minutes for my box to backwards-resolve each ip address
to a host, that's why I added the dots to show the progress (so I could
see if anything was happening at all).

- Jason


-- 
Jason LaPier
Network Manager
TACS/WRRC
University of Oregon


_______________________________________________
EUGLUG mailing list
[email protected]
http://www.euglug.org/mailman/listinfo/euglug

Reply via email to