Re: Survey : Max size allowable for slurping files

Dan Anderson Thu, 22 Jan 2004 16:36:48 -0800

On Thu, 2004-01-22 at 18:21, Dan Anderson wrote:
> On Thu, 2004-01-22 at 17:59, James Edward Gray II wrote:
> > On Jan 22, 2004, at 4:12 PM, Tim Johnson wrote:
> > 
> > > Here's another argument against slurping:  When you slurp a file all at
> > > once, even if your program isn't using up much of the CPU, on many
> > > machines it will slow down performance considerably if you slurp a 
> > > large
> > > file (large, of course, is still sometimes relative).  If that is the
> > > only thing you are running at the time, it may not make much of a
> > > difference, but it is usually not a good idea to assume that.
> > 
> > The flip side of that argument.  A quote from the earlier posted 
> > article:
> > 
> > "Another major win for slurping over line by line is speed. Perl's IO 
> > system (like many others) is slow. Calling <> for each line requires a 
> > check for the end of line, checks for EOF, copying a line, munging the 
> > internal handle structure, etc. Plenty of work for each line read in. 
> > On the other hand, slurping, if done correctly, will usually involve 
> > only one I/O call and no extra data copying. The same is true for 
> > writing files to disk, and we will cover that as well."  --Uri Guttman
> 
> 
> Just to add my $0.02, while you are likely to see your machine slow to a
> halt if you slurp too big a file, there is no guarantee that the extra
> overhead required for going line by line will be noticed, especially if
> you're doing enough other things on every line.


I just thought of a really good example to add.  Let's say you're
migrating from Database A to Database B.  And, because the SQL dump of
database A does something that breaks standards or doesn't work in
database B (i.e. mySQL's AUTO_INCREMENT), you decide to create a perl
script to transform the SQL

You'd have a large number of operations per line (relative to the cost
of reading in a file line by line), and if -- for instance -- you passed
it around your department and somebody tried using it with a database
which was several gigabytes (or possibly even terabytes if you work at a
data wharehouse), you would be asking for trouble.

On the other hand, somebody mentioned slurping web pages because very
few browsers are going to be set to receive 100 GB web pages.

Very true.  But you also need to look at what you're doing.  A spider
that indexes or coallates pages across several sites might need to slurp
up a large number of pages -- which even at a few kilobytes a piece
would be costly on system resources.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Survey : Max size allowable for slurping files

Reply via email to