Duek,

You gave really valuable information! I just finished the basic setup of a smiliar process yesterday.

The idea is almost same. Since I have multiple web servers to deal with, I set up a seperate server (Debian Sarge) with huge disks for web log backup and post analysis. I wrote a remote backup script to copy logs from web servers to this backup/analysis server. Then it calls my own log split script to split logs into vhost name based directories. The vhost information is extracted from the logs instead of the apache configurations. The first run of the splitting took 4 hours for 60GB data yesterday evening.

I have a third script to use analog+rmagic to do the reporting. All vhosts share the same analog and rmagic configuration files. Vhost specific information are appended as command line arguments. Vhost names are from the splitted vhost log directory names. The first run of the reporting took half an hour for 100 vhosts log data splitted from the same 10GB log data. (One unexpected fun I got was that analog can use 3GB virtual/physical memory under my linux kernel. On a solaris 32bit sever, it always crash when memory usage is above 2GB).

So far this process is up and seems working from last midnight. I plan to run the backup and splitting job nightly, and the reporting job weekly. Still I am relatively new to analog+rmagic and need learn tweaking it to get best outputs. I may also need setup a good looking CGI based front page with links to reports of all vhosts. Boss always love eye candy stuff.

When all of this are done. I may post a link to a sample report created by the above process. It's great to be able to exchange knowlodeg and learn from eath other.

Thanks,

Jin

Duke Hillard wrote:

Jin,

   Are you automating your process?  If no, my experience
my be helpful.  Also, you might get ideas from some of the
Analog helper applications at "http://www.analog.cx/helpers/";.

   I run Analog 5.32 on Solaris 8.  Apache 2.0.50 runs on
the same machine.  My logformat is vhost_combined.

   I use the Apache logfile splitter to create separate log files
for each virtual host.  I compress the original logfile and save it.
I delete the separate log files after I run Analog.  The process
of automating the logfile splitter is easy and the files that are
produced have predictable names.

   To create Analog configuration files, I use a script that parses
the Apache configuration file.  It identifies each virtual host and
extracts relevant information.  Then the script creates an Analog
configuration file for each virtual host with only basic information:
LOGFILE, HOSTNAME, HOSTURL, OUTFILE, FILEINCLUDE
All of the configuration files created by the script exist in a directory
that contains no other files.  I also use a "master" configuration file
that has a large number of directives that are consistent for all of
the virtual hosts.

   When I am ready to run Analog, I have a script that looks for
the Analog configuration files, then calls Analog to run on each of
the files while also reading in the master configuration file.  So, the
output is produced virtual host by virtual host.  I repeat this process
once a week for 80 hosts.  I think that a similar process, even with
a different operating system and server software, will work for you.

   Before I automated scripts, I worked with Analog for a long time
to learn all of the commands that are useful for my environment.  This
way, when I started scripting, it was easy to determine if a problem
was caused by poor scripting or by poor use of Analog's commands.
A lot of work was needed to establish the process, but it is reliable
50 times out of 52.  If I tweak my scripts, I might get 100% reliability.

I hope this helps,

-- Duke


Jin Zhao wrote:

This seems a dumb question but I do think it can be valueble if analog can do it.

In our site setup, we always have all virtual hosts logging to one big access_log. This log grows fast (2 million lines per day) and get rotated and compressed nightly. The boss want to know how many "visitors" visited his sites. Everybody knows this is something stupid but i have to give him some numbers, say distinct hosts might be good enough.

The problem is, in order for analog to get distinct hosts for each virtual hosts, I have to split these huge rotated and compressed log files into hundreds of vhost based smaller files. The worse is that I have to create an ananlog configuration for each vhost and run analog&reportmagic against these hundres of smaller log files for hundresd of times.

Dis I miss something useful in current analog features? Can somebody give a better solution to it?

Thanks,


Jin

+------------------------------------------------------------------------
| TO UNSUBSCRIBE from this list:
| http://lists.meer.net/mailman/listinfo/analog-help
|
| Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
| List archives: http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------



+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

Reply via email to