On Thu, Dec 3, 2009 at 1:19 PM, Orchid Fairy (兰花仙子) <practicalp...@gmail.com
> wrote:

> Thanks all.
> How about the files parsing with huge size (about 1T of each day)?
>
> The basic logic is:
>
> reach each line of every file (many files, each is gziped)
> look for special info (like IP, request url, session_id, datetime etc).
> count for them and write the result into a database
> generate the daily report and monthly report
>
> I'm afraid perl can't finish the daily job so I want to know the speed
> difference between perl and C for this case.
>
> // Xiao lan
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
> A daily job that by the sound of it will not be changing a whole lot, jut
get executed pretty much till the end of times... C is your friend. Perl
would certainly get the job done and on time without to much problems, but
if you are worried there isn't much that will out perform C/C++ when it
comes to raw speed.
If you are not planning on making any changes to the code in
the foreseeable future, the extra readability of the Perl code should not
really mater as you do not expect to be making changes to it on a regular
basis anyway.

My biggest worry would not be the 1T (of logs I guess) the code needs to
parse now, but in 4 or 5 years from now the likely doubled amount of data.
It sounds a lot like you are parsing logs from a web server or well actually
quite a few web servers or something along these lines. If the world keeps
on spinning the same way it has for the past couple of million years I am
willing to bet that the size of the logs you are parsing will increase a lot
thus making speed more and more important. (good argument for C right there)

Also if you have a lot of files think about using a lot of parsers/threads
or how ever you do it. Every single machine tasked to process that amount of
data will have more then a single CPU and thus will benefit from using more
then one process at a time executing the work regardless of the language in
which you do the work.

Personally I really don't see a role for Perl in something like this as the
size of the data to parse grows there will be a bigger and bigger desire for
speed. I would simply do it in C and ignore the ease of Perl for parsing
logs. You might save some time in development choosing Perl but you will
likely have to redo the work in C at some point as the input data will just
be getting bigger and bigger.

Reply via email to