Sent: Saturday, March 03, 2001 6:49
AM
Subject: Re: [aseek-users] configuration
quesiton
This seems to be working but still is taking a long time to
index my entire site. Can I do a
index -i -u http://www.yoursite.com/changingdir/subdir/changeddocument.html
for each new or changed document as part of a script (I have a script
that already goes out and finds new and changed documents already that
I could add this process to)? If I can do this do I need to do
anything to load the delta files (like an index -D), or will the index
-i -u ... do that after each file inserted into the database.
Then once per week I can set it up to rewalk the entire site.
Would this work, or would it cause problems in my database.
Thanks
Dan
Kir Kolyshkin wrote:
To limit reindexing, use -u option, argument is URL
mask in SQL form,
in your case it can be http://www.yoursite.com/rapidly_changing_dir1%
So, you'll run index -u nightly, and index without option to reindex
everything
every week.
Daniell Freed wrote:
>
> I need some advice about setting
up aspseek. I have a working installation of
> aspseek, but I
am looking to optimize how it works for my particular needs.
>
> I have a single site that has 4 main directories that need to be
indexed; all
> together there are about 200,000 documents. 2 of
these directories contain
> documents that don't ever change, and
they take up about 70% of the total number
> of documents. The
other 2 directories change daily; there are generally
> anywhere from
50 to 300 new or changed documents every day. (These documents
> are Wordperfect and Word documents that have been converted to html
nightly as
> part of a cron job using some custom perl scripts and a
convertion tool called
> wp2html). I need to update the
changing directories nightly so I can search on
> these new and
changed documents.
>
> When I initially ran index, the
database was created just fine and I was able to
> search the
documents that I needed. Then I started running nightly index jobs
> that took about 30 to 40 minutes to run, but I wasn't seeing any
changes to the
> old documents, and it didn't really look like any
new documents were being added
> either (all of the documents contain
last modified dates that I was using to
> search on). After
poking around in the aspseek.conf file I discovered the
> period
command was set to 7d (7 days) and I figured that was my problem, so I
> lowered this to 6h (6 hours). Now my index is running but it
is taking a really
> long time to run (6 hours so far). Looking
at the logs.txt file, it looks like
> it is indexing everything from
scratch (the queued docs count is up to over
> 100,000 documents).
>
> Is there a way that I can configure AspSeek to only look
for updates in the 2
> directories that contain changes? Or can
I configure searchd to search 2
> different databases at the same
time when a search request is made?
>
> Or (and this is a more
complicated question) can I call index to insert or
> update a single
document at a time? If this works then I can just add this to
>
my conversion script because it already goes through and finds new and
changed
> documents as part of its process.
>
> My goal
here is to be able to run these update scripts overnight so that any
> changes made the previous day are searchable.
>
>
Thanks for the advice.
>
> --
> Daniell Freed
>
Computer Services
> Dewitt, Ross, & Stevens S.C.
>
> He who fights with monsters might take care
> lest he
thereby become a monster.
> And if you gaze for long into an abyss,
> the abyss gazes also into you.
>
> Beyond Good and
Evil
> Friedrich Wilhelm Nietzche
>
>
-- [EMAIL PROTECTED] http://kir.sever.net ICQ 7551596 --
Join CCAUWM - Citizens' Campaign for Abolition of the Use
of the
Word Microsoft (or of Microsoft Word - you choose)
--
Daniell Freed
Computer Services
Dewitt, Ross, & Stevens S.C.
He who fights with monsters might take care
lest he thereby become a monster.
And if you gaze for long into an abyss,
the abyss gazes also into you.
Beyond Good and Evil
Friedrich Wilhelm Nietzche