On Wed, 9 May 2007, Tom Keays wrote: > Still, it is worth asking: Has anyone made a stab at this -- ie, > publically exposing server logs? Are there code examples (any > real-world, generalizable examples would be welcome). Sorry for > cross-posting this.
I've done it in the past -- typically using general analystics programs (eg, analog), or just parsing out relevant data w/ perl. The problem is, a few years ago, that spammers started sending bogus requests to servers, to try to get them to show up in your stats pages. In ORA's case, they're only showing the top 20, and they presumably get lots of requests, so someone would have to hit them pretty hard to get something to show up. If you're thinking about exposing your server logs, I'd recommend the following: 1. Don't give out IP addresses of the requestors (privacy reasons) 2. Don't put on a public page any data that's generated by the user-agent, to include HTTP_USER_AGENT, HTTP_REFERER and QUERY_STRING. All have been used by spammers to insert URLs to try to get links back to their sites. 3. Filter out all entries with 'error' results (people trying to probe your system for vulnerabilities, etc.) 4. Filter out all 'intranet' pages or other pages that the general public shouldn't be going to. 5. Avoid giving information that provides signatures of the CMS you're using, or other signatures of potential vulnerabilities. 6. Use robot.txt to request search engines to not serve whatever pages you generate. For the particular case of generating tag clouds from search results, the problem lies in that you typically need to use QUERY_STRING if it's a local search script, and HTTP_REFERER if it's a remote search engine that linked to you. Both values can't be trusted. In this particular case, I probably wouldn't try a fully automated approach -- I'd generate the page, but require someone to manually verify it before it got posted. ----- Joe Hourcle (insert some statement here about everything being my personal opinions, and that I don't speak for any company, organization, etc.)