Hmmm.....

The reason you'll be using so much processor time is the nested loops.
Once you reach EOF on your log files, the while() will get a false, and
then your script immediately goes back and tries to read from the file
again. This will be eating up your processor! Try inserting a sleep(1)
after the inner while loop - that should make things run a bit better. 


You could try piping in your data from tail, using something like
open(LOGFILE, "tail -f $log_file |") - which would then block when no
data was left to read, and only return EOF (false) when the file was
closed. Alternatively, try getting squid to write it's log files to a
socket, which you can then read from. YMMV.


The line where you do $dbh ||= DBI->connect() looks wrong - because if
MySQL times you out, what'll happen is your $dbh object will still
exist, and will still point to a set of variables in perl's memory, so
this test will never return false after the initial connection. Connect
cached keeps connections in a hash. It does a "ping" on the database
handle before returning it to you - if the ping fails, it reconnects. I
use Apache::DBI to do this for me automatically from mod_perl scripts,
so I've never tried connect_cached myself in production. 

Anyway, I hope this helps :) Let us know how the system goes....

Dan



On Wed, 2003-08-27 at 11:40, Dominic Pain wrote:
> Dan,
> 
> Thanks for that.
> 
> The $sth is already prepared with placeholders before the loop, and then
> executed time and again. I didn't make that clear in my example.  But thanks
> for the tip.
> 
> The trouble isn't the speed, it's what happens if the database goes away.
> Hence the query about connect_cached in a production environment.
> 
> What I gain is the ability to use SQL to get really nice granular reports...
> what sites did the terminal with IP address xxx.xxx.xxx.xxx access between
> 3pm and 4:31pm on 5th july 2002, for instance.
> 
> I could do it at the end of the day, but I want it to be more smooth than
> that.  I've managed to get a similar thing working on our Intranet using
> mod_perl and Apache::DBI, but it's proving more challenging than that with
> Squid.
> 
> So, does anyone have any info on connect_cached?
> 
> Ta
> Dom
> 
> 
> 
> -----Original Message-----
> From: Dan Rowles [mailto:[EMAIL PROTECTED]
> Sent: 27 August 2003 11:17
> To: Dominic Pain
> Subject: Re: cached_connect
> 
> 
> Try:-
> 
> my $sth = $dbh->prepare(INSERT blah);
> 
> while(<LOGFILE>) {
>       @parts = split("blah");
> 
>       $sth->execute($some, $thing, $to, $insert);
> }
> 
> To avoid the overhead of parsing the statement each time.
> 
> However, if you're talking about inserting thousands of requests per
> second..... I'd probably give up and do the insert at the end of the day
> :) In fact, I'd also have to ask what you're looking to gain from having
> this data in a database, that you couldn't get with logrotate, grep, and
> a perl regex. Have a look at the "logchecker" tool that's bundled with
> modern RedHats as an example of what can be done - that does some nice
> looking analysis of system logs by parsing then in perl, and avoids the
> database entirely.
> 
> Anyway, best of luck! 
> 
> Dan
> 
> 
> 
> 
> 
> 
> On Wed, 2003-08-27 at 10:06, Dominic Pain wrote:
> > I'm writing a daemon to monitor a squid logfile, and put it's contents
> into
> > a MySQL database.  It looks something like this:
> > 
> > my $dbh = DBI->connect($dsn, $user, $passwd);
> > 
> > open(LOGFILE, $log) or die "yikes, no logfile? $!\n";
> > 
> > for(;;) {
> > $dbh ||= DBI->connect($dsn, $user, $passwd); ## only re-connect if handle
> > has gone away?
> > 
> > while(<LOGFILE>) {
> >  $dbh->do(qq{INSERT blah, blah, blah into blah})
> > }
> > 
> > }
> > ## end of example
> > 
> > But when I run the script, it takes up loads of processor time.  I'm
> > assuming that it's the reconnect line which is causing the problem.  I
> can't
> > remember where I got that syntax from, and I'm not 100% sure it's sane.
> > 
> > I'm running DBI v1.34, and the manpage says I should discuss
> connect_cached
> > here.  So, here I am.  What's the status of connect_cached?  Is it useful
> or
> > lethal?  
> > 
> > The squid server (s) are going to be *very* busy, as they server all the
> > libraries in our County, as well as all the employees of the County
> Council.
> > At peak times and in bursts, there could be thousands of requests per
> second
> > to each cache.  I want to log this activity in real time, because the
> volume
> > of information at the *end* of a day is enormous and unwieldy to deal
> with.
> > 
> > All info and suggestions appreciated.
> > 
> > Regards
> > Dominic Pain

Reply via email to