Re: fork .. regex ... etc..

Abdulaziz Ghuloum Wed, 20 Jun 2001 09:10:19 -0700
Hello Ron and everybody on this list.

In order to find the best way to complete the task the fastest way, you need to
first analyse your system and how it's going to react to your algorithm. 
Basically your task can be split into 2 parts, one IO intensive part (reading
the files) and one CPU intensive part (processing the data).  If you're going
to only parse certain information from the logs, then the time it takes to
process the data is much shorter than the time it takes to read it.  However,
if you need to do more complicated stuff (finding crossreferences in files,
translating IPs to Hostnames, ...) then that time will dominate IO times.

Now, say that IO takes longer than processing.  You suggested splitting the
list of all 1500 files so that a separate process would process a chunk of the
data.  Would this speed up your program?  My computer has an IDE disk and I bet
that reading all 1500 files sequentially would be far faster than reading files
in parallel because the disk has to do extra turns to switch from position to
position while reading the data in sequence would eleminate the extra turns. 
This is an over simplification to what happens but you get the idea.  And this
may not be the case on your computer.  So, you need to test the fastest way you
can read the files.  (If my files are distributed across 2 hard disks, I would
fire 2 processes to manage each disk).

Does your machine have 2 or more processors?  If so, and your computations are
intensive (read: it takes longer time to extract the information from data in
memory than it takes to read data to memory), then you will sure benefit from
having multi processing or multithreading.

If you have (or can obtain) a threaded perl, then you should have no problem
managing your task using the Thread module.  You add all the file names to a
Thread::Queue, then fire n threads each popping an entry from the queue and
reading the corresponding file.  Data read can be pushed to another queue so
that another pool of threads can process them.  Bear in mind that your program
will run slower if you have more threads doing computation than the number of
processors available and your IO will be slower if you have more reads than
physical devices.

Hope this helps.

Aziz,,,



On Sat, 16 Jun 2001 20:19:15 -0400, Ronald J. Yacketta said:
>  as I emailed earlier, I am working on a log parsing program. This program
>  needs to be able to handle ~1500 logs that total ~100MB in size. Currently,
>  we are only pulling out certain information from the logs, but this can
>  change at anytime. I am trying to figure out a clean way to opendir, read
>  all the files in and then split the files into 3 - 5 separate lists so I can
>  fork children to parse out the required errors. I am not certain as of yet,
>  how to accomplish the splitting of files and then forking.
>  
>  In the meantime I am thinking of ways to cleanly parse the logfiles counting
>  the individual errors. I was thinking of a if (/pattern/ig ) { $var ++; },
>  but this seems kludgie and like it would take a great deal of time to parse
>  the quantity of logs I have. Could you recommend some clean ways to parse
>  300 - 400 logiles pulling out certain information with little system impact?
>  I do not think parsing each file in a foreach (@arrayOfFiles) is the right
>  choice do you ?


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
Re: fork .. regex ... etc..

Reply via email to