By my understanding, for processing one record, you have multiple round-trips 
between Perl process and DB: get data from DB, check the if the data match, and 
send DML command to DB.

I am not familiar with MySQL anymore, but if I have to handle all these DML 
processing quickly on Oracle, I 
will just one DML ("insert all", "merge" statements on Oracle database), so I 
only do one execute and the DB server will do all the job.

Just check online, MySQL has "insert into ... duplicated key" similar to 
Oracle's merge but I could not find MySQL statement similar to Oracle's 
"insert all", but you can write a (stored) procedure do all the DML, 
cannot you?


________________________________
From: Brandon Phelps <bphe...@gls.com>
To: tiger peng <tigerpeng2...@yahoo.com>
Cc: Tim Bunce <tim.bu...@pobox.com>; John Scoles <byter...@hotmail.com>; 
"dbi-users@perl.org" <dbi-users@perl.org>
Sent: Wednesday, September 21, 2011 9:49 AM
Subject: Re: Tail Module + DBI Module, can\'t keep up!

What do yo mean about the condition check in a script?  Not sure I follow.

On 09/21/2011 10:41 AM, tiger peng wrote:
> The first bottleneck is open/close for each record, which are time consuming 
> operation. Why don't you just use one connection?
> The second one is do the condition check in script instead of doing all 
> within database server.
>
> Try to fix this two issues. If it is still too slow, we can do more tuning.
>
>
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------
--
> *From:* Brandon Phelps <bphe...@gls.com>
> *To:* tiger peng <tigerpeng2...@yahoo.com>
> *Cc:* Tim Bunce <tim.bu...@pobox.com>; John Scoles <byter...@hotmail.com>; 
> "dbi-users@perl.org" <dbi-users@perl.org>
> *Sent:* Wednesday, September 21, 2011 9:04 AM
> *Subject:* Re: Tail Module + DBI Module, can\'t keep up!
>
> Thanks for the responses.
>
> I would love to be able to output the data to a flat file or something and do 
> bulk inserts daily, however this isn't really feasible for my situation 
> because the database operations depend on other records.
>
> Basically the system I am creating works like this currently:
>
> 1. Syslog-ng accepts syslog messages from a sonicwall firewall on the 
> network, and logs to a file in /var/log
> 3. Perl script runs as a daemon, extracting open_dt, close_dt, protocol, 
> source_ip, destination_ip, source_port, destination_port, bytes_sent, and 
> bytes_received
> 2. Database table called sonicwall_connections, partitioned by the open_dt 
> column, 1 partition per day
> 4. Perl script from (3) constantly reads in the /var/log files using the Tail 
> module. It does the following:
> a. When a connection is opened, it INSERTs into the sonicwall_connections 
> table
> b. When a connection is closed, it SELECTs from the sonicwall_connection 
> table the last record id that matches the protocol, source_ip, source_port, 
> and destination_port
> c. If a record exists matching this criteria, it UPDATEs that record's 
> close_dt column with the time the connection was closed
> d. If a record does not exist, then in our case this means that the 
> connection was denied due to firewall rules, and we instead INSERT into a 
> different table, sonicwall_denied
>
>
> Previously I was doing the entire SELECT prepare and execute in my never 
> ending while loop that parsed the log. I have since changed this per the 
> recommendations here. Now, I open my connection to the database using 
> Perl::DBI, then I create my prepared statement (outside the while loop), and 
> I simply call execute() within the while loop, so the prepared statement only 
> has to be created once. This seems to have helped a small bit in that the 
> script does not fall behind quite as fast now, however it does still fall 
> behind. Currently with these modifications it is 10AM EST and the script is 
> still working on log entries from yesterday night. Around 12 hours behind, 
> compared to the 22-23 hours behind it was previously, so this change 
> obviously did help a bit... still not perfect yet though.
>
> As you can see from the procedure above I cannot really work with a flat file 
> or non-indexed hot table because I constantly need to read back in records 
> that were previously inserted. Sometimes the record I need was inserted just 
> seconds before, while other times I need to retrieve records that have been 
> there for days. For example, when a user loads a website in their browser, I 
> get my initial open connection log entry and I do the initial INSERT, but 
> only a second later (depending on the size of the site obviously) I am ready 
> to UPDATE the record with the close_dt time (when the web request connection 
> was closed). On the other hand if a user downloads a 10GB file via FTP, or 
> starts a remote desktop session, the connection could potentially be opened 
> for hours if not days, so I also need to be able to read records from that 
> long ago, in order to get the record ID to UPDATE when the connection is 
> closed.
>
> Another reason I cannot store records to a temporary table or flat file is 
> because the actual web gui users most often need the data right away. For 
> example if a manager sees a huge spike in bandwidth on our fiber connection, 
> they need to be able to instantly open the web page and get the exact details 
> of that connection, such as the source ip address... this way they can 
> determine what employee has initiated the download or whatever, and to what 
> destination they are connected. This way they can say, "Oh this employee has 
> a VNC connection opened to their house, that doesn't really seem work 
> related", etc.
>
> On 09/16/2011 12:42 PM, tiger peng wrote:
>  > I just finished tuning an application almost exactly reverse Brandon's, 
>not parsing text file and then loading, but extracting, parsing/transforming 
>the data, and then generate XML file.
>  >
>  > Original application did fetch and parsing record one-by-one, as it the 
>code for parsing/transforming is no implemented efficient. It took around 40 
>minutes to pull data from a highly active transaction database, the DBAs 
>complained about it.I split the extracting, let it quickly dump the data to a 
>flat file ( ~8 minutes) and then parsing/transforming the data and generate 
>the XML file (~3 minutes). It is possible to speed it up even more by 
>(parallel) pulling the data into memory (not very big, ~2GB) or dumping the 
>flat data file on DB server side.
>  >
>  >
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ------------------
----
> --
>  > *From:* tiger peng <tigerpeng2...@yahoo.com 
><mailto:tigerpeng2...@yahoo.com>>
>  > *To:* Tim Bunce <tim.bu...@pobox.com <mailto:tim.bu...@pobox.com>>; John 
>Scoles <byter...@hotmail.com <mailto:byter...@hotmail.com>>
>  > *Cc:* "bphe...@gls.com <mailto:bphe...@gls.com>" <bphe...@gls.com 
><mailto:bphe...@gls.com>>; "dbi-users@perl.org <mailto:dbi-users@perl.org>" 
><dbi-users@perl.org <mailto:dbi-users@perl.org>>
>  > *Sent:* Friday, September 16, 2011 10:13 AM
>  > *Subject:* Re: Tail Module + DBI Module, can\'t keep up!
>  >
>  > Tim,
>  >
>  > I bet at that time your MySQL did not support partition.
>  >
>  > Now, with partition management available, the jobs should be easy.
>  > Using a small partition, called hot partition in DW/ETL field, to receive 
>the new data, as there is only small chunk of data the insert/update DML 
>should be executed fast (keep index locally).When it reach a threshold, split 
>the hot partition, use the fresh hot partition to receive newer data, and 
>merge the older data to the main partition.
>  >
>  > Using of text file has its advantage: When ever the DB is down, the 
>parsing of log files can still run by itself; and when the DB is back, 
>bulk-load tool can be used to catch up the load quickly.
>  >
>  > Tiger
>  >
>  >
 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ------------------
----
> --
>  > *From:* Tim Bunce <tim.bu...@pobox.com <mailto:tim.bu...@pobox.com>>
>  > *To:* John Scoles <byter...@hotmail.com <mailto:byter...@hotmail.com>>
>  > *Cc:* tigerpeng2...@yahoo.com <mailto:tigerpeng2...@yahoo.com>; 
>bphe...@gls.com <mailto:bphe...@gls.com>; dbi-users@perl.org 
><mailto:dbi-users@perl.org>
>  > *Sent:* Friday, September 16, 2011 4:20 AM
>  > *Subject:* Re: Tail Module + DBI Module, can\'t keep up!
>  >
>  > This is all fine advice for performance tuning DBI app and worth doing.
>  > But there's always a limit to what can be achieved on the client.
>  > So it's worth exploring what can be done on the server side, beyond
>  > standard tuning practices.
>  >
>  > I wrote a high volume log insertion app using mysql many years ago.
>  > I can't remember the performance figures, but it was fast.
>  >
>  > A major cost of insertion is maintaining indices. So for maximum
>  > insertion speed you want to have no indices. But to query the log
>  > records you'll probably want indices.
>  >
>  > The way I approached this was to have per-hour tables: log_YYYYMMDDHH
>  > The loader streams the records into the table as fast as it can.
>  > (From memory I used a multi-row INSERT statement, no placeholders.)
>  > Every time it switches to a new hour it triggered the building of
>  > indices on the previous hour's table. It also triggered recreating
>  > per-day views (log_YYYYMMDD) that abstracted the hourly tables.
>  >
>  > The same technique could be applied at whatever time granularity meets
>  > your needs, from table-per day to table-per minute.
>  >
>  > If you can delay the loading of the log data by whatever period you're
>  > using (eg hourly) then you have the option of preparing the log data as
>  > hourly text files and then, when each is complete, using a proprietary
>  > bulk-loading tool to load it.
>  >
>  > Tim.
>  >
>  >
>  >
>  >
>
>

Reply via email to