On 23-Aug-98 Gevaerts Frank wrote:
> Does anyone know if there is a faster way to search a directory tree for a
> certain word than "find /dir/ -name "*.txt"|xargs grep -l "theword" " ?
> I want to do this to make a search engine for my local (LAN) website.
> Since the site is over 50 megs (I mirror a lot of stuff for local use),
> and the server is a 486, the search takes more time than I would like.
> 
> Is it possible to achieve the same effect using some database, while
> allowing a search for _ALL_ words, not just a few predefined keywords? If
> so, how?

Yes, you can use the "locate" database.

It isn't a typical database in the sense of MySQL or Oracle. It is just a
simple utility used for finding files quickly. When I tested it against find,
locate managed to locate the files many times faster than find (10 times faster
on my system, but you shouldn't expect the same improvement). But what it gains
in speed, it loses in features. You will also need to update the database
whenever you add or remove a file.

An even faster solution would be to do the search in advance. Run "find /dir/
-name "*.txt" > searchresult" and run "cat searchresult|xargs grep -l "theword"
" when you need to search the files. There probably wouldn't be much
improvement over using locate.

Neither of these two solutions speed up the grep part. Using a typical database
might help, but I have never used any before. Another solution is to implement
something yourself, that will almost certainly improve performance, but it is
time consuming and may be less flexible. Let me know if you wish to try this,
I'm always on the lookout for tiny projects to play around with.

I'm not too sure what you mean by allowing a search for all words, but I have
included a simple CGI script which should demonstrate something to that effect.
This is my first attempt at writing a CGI script and I have based it on the
finger CGI script that comes with apache. Please forgive the lameness of this
script. Incidently, the database shouldn't be in /tmp/ but I'm not sure where
it should be placed instead for use from a CGI script.

-- START DEMO CGI SCRIPT --
#!/bin/sh

echo Content-type: text/html
echo

if [ $# = 0 ]; then
        cat << EOM
<TITLE>Locate Gateway</TITLE>
<H1>Locate Gateway</H1>

<ISINDEX>

This is a gateway to "locate". Type a search string in your browser's
search dialog.<P>
EOM
else
        echo \<PRE\>
        locate -d /tmp/testlocatedb "*.txt" | xargs grep -l "$*"
fi
-- END DEMO CGI SCRIPT --

Cort
[EMAIL PROTECTED]

Reply via email to