Re: Search Engines in Perl

John W. Krahn Sun, 07 Jul 2002 15:37:38 -0700

Vargas Media wrote:
> 
> Hi All,

Hello,


> I am new to Perl and I have been grabbing code here and there to make a
> search engine for a site. - Posted below.
> What I have works but is incredibly slow and doesn't report back when a
> match is not found.
> I have been working with "CGI Programming with Perl" from O'Rielly but,
> haven't had much luck with the creating custom modules for the more
> professional search engine they suggest. I will attempt that and perhaps
> post to this list again if I hit any snags.
> Any helpful hints or URL's would be appreciated.
> Thanks for being there.
> Steve Vargas
> PS The form for the code below is at:
> http://www.musicfan.com/search.html
> and the Perl Code is below.

Are you running this on Perl version 4?


> #######Search Engine Code - Mere Beginnings######
> #!/usr/bin/perl

You should always enable warnings and strictures when developing code. 
You should use taint mode for web programs.

#!/usr/bin/perl -Tw
use strict;


> require "get_form_data.pl";

You should be using the CGI.pm module instead of something you wrote
yourself.

perldoc CGI

> &get_form_data();

You shouldn't call subroutines with the ampersand at the front.

perldoc perlsub

> $search_term = $FORM{'search'};
> print "Content-type: text/html\n\n";
> &search("..");
> print <<EOF;
> <HTML>
> <HEAD>
> <TITLE>
> Search
> </TITLE>
> </HEAD>
> <BODY BGCOLOR="#FFFFFF">
> EOF
> foreach $file (@found_set)
> {
> $_ = $file;
> s/..\/html\///g;
    ^^
Periods are special in regular expressions.  You probably meant
s|\.\./html/||g;

> $file = $_;
> $newfile = $file;

Why all the copying back and forth between varables?

foreach my $file ( @found_set ) {
    ( my $newfile = $file ) =~ s|\.\./html/||g;

> if(/htm/) {
> print "<A HREF=\"/$newfile\">$newfile</A>\n";
> print "<BR>\n";
>  }
> }
> print <<EOF;
> </BODY>
> </HTML>
> EOF
> exit;
> sub search
> {

You should be using the File::Find module for file searches.

perldoc File::Find

>   local ($dir) = @_;

You should use my instead of local for proper scoping.

perldoc perlsub

>   if($dir eq ".")
> {
>   opendir(DIR, ".");

You should _always_ verify that system functions were successful.

  opendir DIR, '.' or die "Cannot open '.': $!";

>   $dir = "";
> }
> else
> {
>   opendir(DIR, $dir);

You should _always_ verify that system functions were successful.

  opendir DIR, $dir or die "Cannot open $dir: $!";

>   $dir .= "/";
> }
>   foreach $file (sort readdir(DIR))
                   ^^^^
Do you really need a sorted list?

> {
>   next if($file eq "." || $file eq "..");
>   $file = $dir . $file;
>   next if(($file !~ /.htm/) && (!(-d $file)));
                      ^^^^^
Skip this file if it does not have any character (.) followed by the
string 'htm' anywhere in the file name.

>   if(-d $file)

Your previous statement does not let directories get to this point.

>   {
>     &search($file);
>     next;
> }
> open(FILE, $file);

You should _always_ verify that system functions were successful.

open FILE, $file or die "Cannot open $file: $!";

> $found_match = 0;
> $title = "";
> while(<FILE>)
> {
>   if(/$search_term/i)

If $search_term contains any characters that are used in regular
expressions this won't work correctly.

perldoc perlre

>   {
>     $found_match = 1;
>   }
>   if(/<TITLE>/)
>  {
>   chop;

You should use chomp instead of chop unless you want to remove any
character from the end of the string.

>   $title = $_;
>   $title =~ s/<TITLE>//g;
>   $title =~ s/<\/TITLE>//g;
> 
>   }
> }
> if($found_match)
> {
>  push(@found_set, $file);
>  if($title eq "")
> {
>   $title{$file} = $file;
> }
>  else
> {
>   $title{$file} = $title;
>   }
> }
>  close(FILE);
>  print "<P>\n";
> }
>  closedir(DIR);
> }


As a style issue I would also suggest a little more whitespace and
indentation to make your code more readable.

perldoc perlstyle



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Search Engines in Perl

Reply via email to