I still like Mark Anderson's way... you can use a pattern to match across
multiple lines.
ie /some crap\ncrap on the next line\nand more crap/s;


-----Original Message-----
From: Curtis Poe [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 28, 2002 1:09 PM
To: Hans Holtan; [EMAIL PROTECTED]
Subject: Re: searching a large file


--- Hans Holtan <[EMAIL PROTECTED]> wrote:
> I am having a problem searching a large (600 mb) text file. What I 
> need to do is find a match with a short bit of text and then look up 
> to 200 characters forwards and backwards for other matches to 
> different short bits of text. I tried reading the file to memory 
> first and then doing the search, but it's a serious hog, and I need 
> to leave a lot of memory open for other operations. Does anyone have 
> suggestions on how I can do this while limiting memory usage, speed 
> is a factor but not paramount.
> Thanks,
> Hans
> -- 

Hans,

This was such a fun little problem that I went ahead and wrote the program
for you.  You may have
to modify this to fit your needs.  I've also "over commented" it to give you
some pointers, in
case you're not too familiar with Perl.

Here's the basic idea:

1.  Read from a file and search for target text
2.  If we're at the target text, find out where we are in the file
3.  From current location, set start and end positions in file to
    mark needed text.  Test to ensure we haven't gone beyond beginning
    or end of file.
4.  Grab text from start to end and push onto array.

The only caveat I can think of is this:  if you are grabbing too many chunks
of data, you may wish
to process them individually rather than pushing them onto an array (since
you may have memory
issues).

Enjoy!

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

# this is how far forward or back you need to read
my $width  = 20;

# this is your target string.  You can make it a regex if you prefer
my $target = 'search';

# file to search
my $file   = 'test.txt';
my $fsize  = -s $file;

# when you're done, this should contain the data you're looking for
my @chunks;


open FILE, "< $file" or die "Cannot open $file for reading: $!";

while (<FILE>)
{
        if ( /$target/g )
        {
                my $file_position = tell FILE;

                # backwards from end of string
                my $word_position = $file_position - (length( $_ ) - pos( $_
));
                # to beginning of word.  It's separate so you can
                # pull it out if necessary.
                $word_position -= length $target;
                push @chunks, get_chunk( \*FILE, $word_position,
$file_position, $width, $fsize );
        }
}

print Dumper \@chunks;

close FILE;

sub get_chunk
{
        my ( $fh, $word_position, $file_position, $width, $fsize ) = @_;

        # don't try to read before beginning of file
        my $start = $word_position >= $width
                ? $word_position - $width
                : 0;

        # don't try to read after end of file
        my $end   = $word_position + $width <= $fsize
                ? $word_position + $width
                : $fsize;

        # position to start of where we want to read
        seek $fh, $start, 0;
        my $chunk;

        # shouldn't fail unless I got my boundaries wrong
        read ( $fh, $chunk, $end-$start ) or die "Problem reading file: $!";

        # put us back to where we were
        seek $fh, $file_position, 0;
        return $chunk;
}

Cheers,
Curtis "Ovid" Poe

=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:
push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//;
shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A

__________________________________________________
Do You Yahoo!?
Yahoo! Greetings - Send FREE e-cards for every occasion!
http://greetings.yahoo.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

----------------------------------------------------------------------------
--------------------
The views and opinions expressed in this email message are the sender's
own, and do not necessarily represent the views and opinions of Summit
Systems Inc.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to