On Sun, Apr 20, 2008 at 8:55 PM, Mr. Shawn H. Corey <[EMAIL PROTECTED]> wrote: > On Sun, 2008-04-20 at 20:22 -0400, Chas. Owens wrote: > > No, you obviously don't know how it is implemented. It seeks to the > > end of the file and reads it into a buffer where it searches for line > > endings. It does not read the entire file until you reach the first > > line. > > > > That's not the point. > > Seeking the end of file means slowing down; because files are not linear > in the disk space. To find the end of file means going through all the > segments allocated to the file. And even in the best of OSes, it means > reading a chain of sectors/clusters to find the end of file. And then > work their way backward from that.
In ext2, and most other file systems I am aware of, it is a tree of blocks. This means to seek to the end of the file is an O(log n) operation. Given 4k blocks, a seek to the end of a 10 gigabyte file would result in the reading of 22 inodes*. An inode is around 128 bytes on ext2 so the amount of information that needs to be read is slightly more than 2k**. Now a better argument is that these inodes are scattered around the disk so there is significant read head movement, but that is not as much of an issue in modern multiuser/multitasking machines (linear reads tend to be interrupted anyway by other processes). Also, some filesystems now keep portions of the inode tree in memory, in addition to the canonical disk copies, speeding up the lookup even more. snip > That involves time. > > But since every line has variable line lengths, you cannot predict how > much to back up. In other words, it's a waste of time. snip Again, read the code. It does not read the file a line at a time; it reads in a large buffer (it looks like the default is 8k) from the file and then searches for the line endings in that. snip > The fastest way to do this is to read every line into Perl and disregard > everything not relevant. snip You may want to revisit file system design. Try this Perl program on a largish file*** and then tell me which is faster (if you only care about items near the end of the file). #!/usr/bin/perl use strict; use warnings; use Benchmark qw<timeit timestr>; die "usage: $0 file" unless @ARGV == 1; my $file = shift; open my $fh, "<", $file or die "could not open $file: $!"; binmode $fh; $/ = \(4*1024); #give readline an advantage (read 4k instead of a line) print "seek to end of file: ", timestr(timeit 1, sub { seek $fh, 0, 2 }), "\n", "seek to start of file: ", timestr(timeit 1, sub { seek $fh, 0, 0 }), "\n", "read entire file: ", timestr(timeit 1, sub { 1 while <$fh> }), "\n"; snip > This is what Perl was design for. There are no magic tricks. You > cannot fart before you ate beans. snip The next thing you will tell me is that the best possible sorting algorithm is O(n log n) for its worst case****. To paraphrase the bard, there are more things in Computer Science than are dreamt of in your naive implementations. Yes, this is what Perl was designed for, and therefore it gives us the tools (like seek and File::ReadBackwards) to quickly deal with things in an efficient manner. Is it a good idea to read an entire file with File::ReadBackwards? No. Is it a good idea to use File::Backwards to read the last 100m of a 100g file? Yes. * ln((10 * 1,024 * 1,024) / 4) / ln(2) = 21.3219281 ** 22*128 = 2,816 *** I used an Ubuntu iso, about 700m, and got 0 seconds for both seeks and about 32 wall clock seconds (0.60 usr + 1.92 sys = 2.52 CPU) for the, cheating, readline on HFS+ (journaled) **** see radix sort for an O(n) sorting algorithm: http://en.wikipedia.org/wiki/Radix_sort -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/