In my webmastering days we used AWStats to analyze our log files.
http://awstats.sourceforge.net/ It has been a while, but I remember it being very configurable and easy to use. It might we worth looking it over to see whether it would yield what you want for your analysis...might save you some headaches. Eric Lease Morgan wrote:
How would you go about doing some analysis of your website's referrer data? I have committed to writing an article for the anniversary issue of First Monday (as if I don't already have enough to do). Here is the accepted/proposed title and abstract: Ethical issues surrounding freely available information found on the Web By reverse engineering Google queries and by tracing back the referrer values found in Apache log files, the use of content made available from infomotions.com is examined and ethical questions are asked. While all the content from the site is "freely" available under the GNU Public License, the content is not always used in the intended manner. This raises interesting questions regarding the time spent making the content available, the expense of the hardware and network connections, and whether or not the application of the content is put to good and moral purposes. This essay addresses these and other ethical questions in an attempt to come to an understanding regarding the place of information and knowledge in an "open" environment. I find it interesting to watch the content of my access_log scroll by on my console. I am most interested in the referrer information. Most of my hits originate as searches against Google. It is fun feed these queries back into Google and see what people searched for, watch what the searches return, and see what page number my item is located. I see that a lot of the hits to my site come from MySpace.com where teenaged and college aged girls have incorporated some of my pictures into their pages. Another common use is on "bulletin board" systems where someone used one of my pictures as their avatar. In these second and third cases should I expect some sort of remuneration or at least a link back to infomotions.com? Some hits come from really weird places. For example, the search for "lease" brings back many hits about equipment rental, but sometimes my name and/or the Alex Catalogue of Electronic Texts is linked from the equipment rental site. Sort of strange if you ask me. They are using my name, sort of. ("Is it 'my' name?") In any event, I plan to take two months of access_log data, extract the pages being looked at and the referrer information to more systematically examine how the content on Infomotions is being incorporated into other sites. How would you suggest I do this? Presently I plan to extract the necessary information from my logs and dump it into a flat database file where I will exploit various incarnations of SQL SELECT statements. Count this. Group that. Sort this way. Etc. Mind you, I am most interested in the one-off sort of hits, not just the overall usage. How would you go about doing this sort of analysis? All I have to start with is my Apache "combined" access_log files? -- Eric Lease Morgan University Libraries of Notre Dame