[analog-help] Very large logfiles

2004-04-27 Thread analog-help
I apologize if this shows up on list twice. I posted to the
newsgroup yesterday, but it never posted. 

First of all I'd like to say that Analog is the fastest log
analysis software I have ever seen. On a fast intel it chews through our
~12M lines of apache logs in about 2 minutes. Great software :)

Now for my problem. We process ~3GB of logs daily for our main
domain. The management likes to see cumulative numbers from day to day.
So, what I'm doing is processing the log files and making a computer
output file for Report Magic, as well as a cachefile. Then the next day,
when my scripts run, they move the CACHEOUTFILE from the day before to
the CACHEFILE filename, and proless the logfiles and the CACHEFILE 
together to create a cumulative report. 

The box I'm doing this on has 4GB of ram, but it's still using
it all and blowing out with a Ran out of memory error. I read the 
docs on cachefiles and low memory usage, but even with HOSTLOWMEM 3 
and a FILEALIAS for our commonly accessed filenames, I'm still running 
out of memory after about 2 days worth of data. 

Am I doing something wrong? I see people that have over a years
worth of data in their reports, but at this rate, I'll not be able to
get a week. Can anyone offer some advice?

Sam
+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] Very large logfiles

2004-04-27 Thread analog-help
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi-
This is just a stab in the dark, but do you have the referrer report 
turned on? That will use up a lot of memory.
- --Quentin
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAjpD11ePQTjeBqRARAv59AKDVPBHW+/N2QefKfepD6iEuAZzGbgCfcMQJ
/yivVDlfnTMl+KjfCxFXiJk=
=P56Y
-END PGP SIGNATURE-

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] Very large logfiles

2004-04-27 Thread analog-help
Greetings, Sam.
One solution to dealing with large log files is to break down the  
report into multiple reports, using aggressive ALIAS and LOGFORMAT  
techniques.

On a custom system I set up, a suite of reports runs with the aid of a  
Perl helper program once per hour. On the raw traffic report, all page  
requests are aliased to the same filename, somepage.someformat, using  
FILEALIAS. A subsequent FILEINCLUDE command passes only that one  
filename and filters everything else. For that report, almost all the  
fields in the log file are defined as junk, using %j -- the only ones  
kept are the requested file %r and the time/date fields %d %M %Y %h %n.

The resulting cachefiles are quite small, and so are the demands upon  
the box doing the number crunching. 16 reports update in under a  
minute.  The first time I run a report, it takes a lot longer to crunch  
through all the logs, write all the cachefiles and complete the report,  
but that only has to happen once.  Each report contains far less than  
you get in a default analog config, but since the default config chokes  
when we feed it our giganto-logfiles, divide and conquer seems to be  
the best bet.

I recommend that you peep a cachefile and see what's taking up a lot of  
space.  If it's data you can't live without, break that bit out into a  
separate report, with its own set of cachefiles.

Cheers,
-- Marvin Humphrey
On Apr 27, 2004, at 8:51 AM, Samuel Kesterson wrote:
I apologize if this shows up on list twice. I posted to the
newsgroup yesterday, but it never posted.
	First of all I'd like to say that Analog is the fastest log
analysis software I have ever seen. On a fast intel it chews through  
our
~12M lines of apache logs in about 2 minutes. Great software :)

	Now for my problem. We process ~3GB of logs daily for our main
domain. The management likes to see cumulative numbers from day to day.
So, what I'm doing is processing the log files and making a computer
output file for Report Magic, as well as a cachefile. Then the next  
day,
when my scripts run, they move the CACHEOUTFILE from the day before to
the CACHEFILE filename, and proless the logfiles and the CACHEFILE
together to create a cumulative report.

The box I'm doing this on has 4GB of ram, but it's still using
it all and blowing out with a Ran out of memory error. I read the
docs on cachefiles and low memory usage, but even with HOSTLOWMEM 3
and a FILEALIAS for our commonly accessed filenames, I'm still running
out of memory after about 2 days worth of data.
	Am I doing something wrong? I see people that have over a years
worth of data in their reports, but at this rate, I'll not be able to
get a week. Can anyone offer some advice?
	
Sam
+-- 
--
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+-- 
--

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


[analog-help] Analog not reading beyond certain point in log files.

2004-04-27 Thread analog-help

Last week I turned on referrer and browser logging in my Apache httpd.conf.
Apache read the log files up until that point, but refuses to read anything
after.

I have tried setting the LOGFORMAT variable to COMBINED but have had no luck
in getting it to work.

Because of this, it will not output browser, platform or OS reporting.

Has anyone else experienced this?  I would be very grateful for any
pointers.

Colin

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] Analog not reading beyond certain point in log files.

2004-04-27 Thread analog-help
This issue has been discussed on this list in the past, so I looked
in the list archives and found that specifying multiple logformats is
acceptable and that Analog will use the first matching format.  To
do this, specify logformats first, then specify the logfile (like this).
APACHELOGFORMAT (%h %l %u %t %v \%r\ %s %b)
APACHELOGFORMAT (%h %l %u %t \%r\ %s %b \%{Referer}i\ 
\%{User-Agent}i\)
LOGFILE /var/log/apache/access.log

The example above might not have the exact fields that you need.
See http://www.analog.cx/docs/logfmt.html; for a list of fields.
By the way, when using Apache, the APACHELOGFORMAT
command often produces better results than the LOGFORMAT.
I viewed the following archived messages:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg03282.html
http://www.mail-archive.com/[EMAIL PROTECTED]/msg11625.html
http://www.mail-archive.com/[EMAIL PROTECTED]/msg13476.html
HTH,
-- Duke
kn0wledge wrote:
Last week I turned on referrer and browser logging in my Apache httpd.conf.
Apache read the log files up until that point, but refuses to read anything
after.
I have tried setting the LOGFORMAT variable to COMBINED but have had no luck
in getting it to work.
Because of this, it will not output browser, platform or OS reporting.
Has anyone else experienced this?  I would be very grateful for any
pointers.
Colin
 

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+


Re: [analog-help] PDF

2004-04-27 Thread analog-help
See http://analog.cx/docs/faq.html#faq143;.
HTH,
-- Duke
Luis Mercado wrote:
How can I know how many people saw a PDF file, and How many download it?
I read that a single PDF can score many hits? Is there a way to control
it?
Thanks, Luis.
+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+
 

+
|  TO UNSUBSCRIBE from this list:
|http://lists.isite.net/listgate/analog-help/unsubscribe.html
|
|  Digest version: http://lists.isite.net/listgate/analog-help-digest/
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+