Re: hash speed

Jay Wed, 16 Feb 2005 09:40:44 -0800

On Wed, 16 Feb 2005 08:13:02 +0000, Ken Gillett <[EMAIL PROTECTED]> wrote:


> As an extension to my question, what about when repeatedly adding to a
> data set that needs to be written to a file? Will it be quicker to
> write each line directly to the file, or repeatedly add to a variable
> then write that to the file in one hit?
> 
> My guess is that this will have a more definitive answer since the
> speed difference between writing to a variable and writing to a file
> will make it a more obvious outcome and indeed my experience indicates
> that writing to a file is measurably slower. But does anyone have any
> in depth knowledge of these processes.
> 

Ken,

Again, it will really depend.  How "big" is the hit going to be?  Big
enough that storing the data in memory will eat up your RAM and force
you to swap?  If you're dealing with thousands of lines, you may not
want to store them all in memory, especially if you're like me,
running a database server on a PII/133 with 16M of RAM.  On the other
hand, if you're on a brand new P4 with 2G of DDR2, who really cares? 
Of course if your data set is multiple terabytes, even 2G isn't going
to be enough.

What's you I/O look like?  Does your system support buffered disk
writes?  If so, passing the I/O off to the kernel's buffer won't cost
you much on any current desktop drive, and 15Krpm SCSI drives in high
end servers live for this.  On the other hand, are you performing
unbufferd I/O on an old 2400rpm disk?  Then you might want to think a
little.  Does it even matter at what point you write to a 56K dailup
ftp connection?

Next question:  what else is running on the machine?  Are other
processes eating up RAM and forcing you to swap?  Then write.  Is some
long-running find or A/V operation flodding the IDE or SCSI bus?  Are
you writng to a network drive that's responding slowly?  Write out a
chunk.

But the truth is that unless you're running a server dedicated to just
this one perl script, the environment is going to change from minute
to minute.  Tomorrow, the state may reverse itself.  If you know some
things about the quirks of you system, by all means, code for them. 
But on the whole, your best best is to code in a way that's readable
and makes sense to you and you maintenance programmers.    You time is
far more valuable than the few milliseconds of cpu or i/o you'll save
in most cases.

If you find yourself staring at the screen thinking "why is this
taking so long?", then that's the time to look at your i/o and memory
usage and think about what you can do differently in that case.  Then
you can benchmark a couple of methods and see what's going on.  But
there still isn't going to be an hard and fast rule that this is
always faster than that, or that some operations are always to be
avoided.  You know two things about and peice of data you're program
is currently processing: it's in memory now, and it needs to get
written to a device (screen, disk, socket) or discarded eventually. 
What steps make the most sense to get any particular bit to any
particular final destination will depend heavily on circumstance.

HTH,

--jay

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: hash speed

Reply via email to