This seems to be working but still is taking a long time to index my entire site.  Can I do a

index -i -u http://www.yoursite.com/changingdir/subdir/changeddocument.html

for each new or changed document as part of a script (I have a script that already goes out and finds new and changed documents already that I could add this process to)?  If I can do this do I need to do anything to load the delta files  (like an index -D), or will the index -i -u ... do that after each file inserted into the database.

Then once per week I can set it up to rewalk the entire site.

Would this work, or would it cause problems in my database.

Thanks

Dan

Kir Kolyshkin wrote:

To limit reindexing, use -u option, argument is URL mask in SQL form,
in your case it can be http://www.yoursite.com/rapidly_changing_dir1%

So, you'll run index -u nightly, and index without option to reindex everything
every week.

Daniell Freed wrote:
>
> I need some advice about setting up aspseek.  I have a working installation of
> aspseek, but I am looking to optimize how it works for my particular needs.
>
> I have a single site that has 4 main directories that need to be indexed; all
> together there are about 200,000 documents.  2 of these directories contain
> documents that don't ever change, and they take up about 70% of the total number
> of documents.  The other 2 directories change daily; there are generally
> anywhere from 50 to 300 new or changed documents every day.  (These documents
> are Wordperfect and Word documents that have been converted to html nightly as
> part of a cron job using some custom perl scripts and a convertion tool called
> wp2html).  I need to update the changing directories nightly so I can search on
> these new and changed documents.
>
> When I initially ran index, the database was created just fine and I was able to
> search the documents that I needed.  Then I started running nightly index jobs
> that took about 30 to 40 minutes to run, but I wasn't seeing any changes to the
> old documents, and it didn't really look like any new documents were being added
> either (all of the documents contain last modified dates that I was using to
> search on).  After poking around in the aspseek.conf file I discovered the
> period command was set to 7d (7 days) and I figured that was my problem, so I
> lowered this to 6h (6 hours).  Now my index is running but it is taking a really
> long time to run (6 hours so far).  Looking at the logs.txt file, it looks like
> it is indexing everything from scratch (the queued docs count is up to over
> 100,000 documents).
>
> Is there a way that I can configure AspSeek to only look for updates in the 2
> directories that contain changes?  Or can I configure searchd to search 2
> different databases at the same time when a search request is made?
>
> Or (and this is a more complicated question) can I call index to insert or
> update a single document at a time?  If this works then I can just add this to
> my conversion script because it already goes through and finds new and changed
> documents as part of its process.
>
> My goal here is to be able to run these update scripts overnight so that any
> changes made the previous day are searchable.
>
> Thanks for the advice.
>
> --
> Daniell Freed
> Computer Services
> Dewitt, Ross, & Stevens S.C.
>
> He who fights with monsters might take care
> lest he thereby become a monster.
> And if you gaze for long into an abyss,
> the abyss gazes also into you.
>
> Beyond Good and Evil
> Friedrich Wilhelm Nietzche
>
>

--  [EMAIL PROTECTED]  http://kir.sever.net ICQ 7551596  --
Join CCAUWM - Citizens' Campaign for Abolition of the Use
of the Word Microsoft (or of Microsoft Word - you choose)

-- 
Daniell Freed
Computer Services
Dewitt, Ross, & Stevens S.C.

He who fights with monsters might take care 
lest he thereby become a monster. 
And if you gaze for long into an abyss, 
the abyss gazes also into you.

Beyond Good and Evil
Friedrich Wilhelm Nietzche
 

Reply via email to