Jin,
Are you automating your process? If no, my experience my be helpful. Also, you might get ideas from some of the Analog helper applications at "http://www.analog.cx/helpers/".
I run Analog 5.32 on Solaris 8. Apache 2.0.50 runs on the same machine. My logformat is vhost_combined.
I use the Apache logfile splitter to create separate log files for each virtual host. I compress the original logfile and save it. I delete the separate log files after I run Analog. The process of automating the logfile splitter is easy and the files that are produced have predictable names.
To create Analog configuration files, I use a script that parses the Apache configuration file. It identifies each virtual host and extracts relevant information. Then the script creates an Analog configuration file for each virtual host with only basic information: LOGFILE, HOSTNAME, HOSTURL, OUTFILE, FILEINCLUDE All of the configuration files created by the script exist in a directory that contains no other files. I also use a "master" configuration file that has a large number of directives that are consistent for all of the virtual hosts.
When I am ready to run Analog, I have a script that looks for the Analog configuration files, then calls Analog to run on each of the files while also reading in the master configuration file. So, the output is produced virtual host by virtual host. I repeat this process once a week for 80 hosts. I think that a similar process, even with a different operating system and server software, will work for you.
Before I automated scripts, I worked with Analog for a long time to learn all of the commands that are useful for my environment. This way, when I started scripting, it was easy to determine if a problem was caused by poor scripting or by poor use of Analog's commands. A lot of work was needed to establish the process, but it is reliable 50 times out of 52. If I tweak my scripts, I might get 100% reliability.
I hope this helps,
-- Duke
Jin Zhao wrote:
This seems a dumb question but I do think it can be valueble if analog can do it.
In our site setup, we always have all virtual hosts logging to one big access_log. This log grows fast (2 million lines per day) and get rotated and compressed nightly. The boss want to know how many "visitors" visited his sites. Everybody knows this is something stupid but i have to give him some numbers, say distinct hosts might be good enough.
The problem is, in order for analog to get distinct hosts for each virtual hosts, I have to split these huge rotated and compressed log files into hundreds of vhost based smaller files. The worse is that I have to create an ananlog configuration for each vhost and run analog&reportmagic against these hundres of smaller log files for hundresd of times.
Dis I miss something useful in current analog features? Can somebody give a better solution to it?
Thanks,
Jin
begin:vcard fn:Duke Hillard n:Hillard;Duke org:University of Louisiana at Lafayette;University Computing Support Services adr:;;P.O. Box 42770;Lafayette;LA;70504-2770;USA email;internet:[EMAIL PROTECTED] title:University Webmaster tel;work:337.482.5763 url:http://www.louisiana.edu/ version:2.1 end:vcard
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

