Run indexer -a -s0
-------------------------
I've been doing a bit of poking around, and have noticed a couple of odd
behaviours. I'm
using the most recent version from CVS but I've experienced very similar
behaviour from 3.1.9.
First, if I do this:
# indexer foo
(after a few URLS, ^C to break)
# indexer foo
... then indexer continues with the rest of the non-indexed URLs, as I
would expect.
But, if I do this:
# indexer foo -n 10
# indexer foo
Indexer[13686]: indexer from mnogosearch-3.1.12/MySQL started with 'foo'
Indexer[13686]: [1] Done (0 seconds)
# indexer foo -S
Database statistics
Status Expired Total
-----------------------------
0 0 48 Not indexed yet
200 0 10 OK
-----------------------------
Total 0 58
... I would expect it to continue with the non-indexed URLs, but it
assumes they're all up to
date. Is this expected behaviour or a bug?
Also, even though I have "Index no" in my config, I see these lines
occasionally in my mysql
query log:
...
33 Query INSERT INTO url
(url,referrer,hops,crc32,last_index_time,next_index_time,status,tag,category)
VALUES
('http://barracuda.enhydra.org/media/header/enhydraFtr.gif',1,1,0,983780385,983780385,0,'','')
33 Query DELETE FROM dict WHERE url_id=1
33 Query UPDATE url SET
status=200,last_mod_time=983780385,next_index_time=984385185,tag='',txt='...........................
........................... ...........................
...........................
...........................
........................... Barracuda
Project About Barracuda Project Mail Lists',title='The home of
Barracuda at
Enhydra.org',content_type='text/html',docsize=26706,keywords='j2ee
enhydra lutris java
application server xml open source JDDI XMLC XML Compiler wireless chtml
xhtml wml
J2EE',description='',crc32=1453362912,lang='',category='' WHERE rec_id=1
... looks like it's storing index info? Is indexer somehow losing its
configuration, or do I
have something mis-configured?
cheers,
Damon
----- Original Message -----
From: Damon Tkoch
To: [EMAIL PROTECTED]
Sent: Monday, March 05, 2001 12:02 PM
Subject: link validation config help!
Hi,
I'm trying to use mnogosearch as a link validator for a large
number of sites, but I
ran into a serious problem.
Here's my configuration, in it's simplest form:
DBAddr ...
DeleteBad no
Index no
CheckOnly NoMatch Regex ^http://barracuda\.enhydra\.org/.*\.html$
Realm *
URL http://barracuda.enhydra.org/index.html
This works beautifully, checking the existance of links outside the
barracuda.enhydra.org but not following. Except when indexer gets
to this link, it
follows it and starts indexing the other site.
<A href="http://www.sys-con.com/java/readerschoice2001/">
So now indexer is following through that page, all of its links,
etc, and suddenly
indexer is trying to check the whole world, ignoring the CheckOnly
parameter.
I've tried different versions of the CheckOnly, with or without
regex, splitting it
into multiple lines, etc... nothing seems to help. And indexer
doesn't ignore the
CheckOnly for all sites, just a few.
Any ideas?
(I first tried a Server-based method,
DBAddr ..
DeleteBad no
Index no
Folllow site
Server http://barracuda.enhydra.org/index.html
but this does not validate links from this site to another.)
-Damon
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]