Re: link validation config help!

Damon Tkoch Mon, 05 Mar 2001 00:29:09 -0800

I've been doing a bit of poking around, and have noticed a couple of odd behaviours. I'm using the most recent version from CVS but I've experienced very similar behaviour from 3.1.9.

First, if I do this:

# indexer foo

(after a few URLS, ^C to break)

# indexer foo

... then indexer continues with the rest of the non-indexed URLs, as I would expect.

But, if I do this:

# indexer foo -n 10

# indexer foo

Indexer[13686]: indexer from mnogosearch-3.1.12/MySQL started with 'foo'
Indexer[13686]: [1] Done (0 seconds)

# indexer foo -S

Database statistics

    Status    Expired      Total
   -----------------------------
         0          0         48 Not indexed yet
       200          0         10 OK
   -----------------------------
     Total          0         58

... I would expect it to continue with the non-indexed URLs, but it assumes they're all up to date. Is this expected behaviour or a bug?

Also, even though I have "Index no" in my config, I see these lines occasionally in my mysql query log:

...

                     33 Query      INSERT INTO url (url,referrer,hops,crc32,last_index_time,next_index_time,status,tag,category) VALUES ('http://barracuda.enhydra.org/media/header/enhydraFtr.gif',1,1,0,983780385,983780385,0,'','')
                     33 Query      DELETE FROM dict WHERE url_id=1
                     33 Query      UPDATE url SET status=200,last_mod_time=983780385,next_index_time=984385185,tag='',txt='........................... ........................... ........................... ........................... ........................... ...........................                            Barracuda Project About Barracuda Project Mail Lists',title='The home of Barracuda at Enhydra.org',content_type='text/html',docsize=26706,keywords='j2ee enhydra lutris java application server xml open source JDDI XMLC XML Compiler wireless chtml xhtml wml J2EE',description='',crc32=1453362912,lang='',category='' WHERE rec_id=1

... looks like it's storing index info? Is indexer somehow losing its configuration, or do I have something mis-configured?

cheers,

Damon

----- Original Message -----

From: Damon Tkoch

To: [EMAIL PROTECTED]

Sent: Monday, March 05, 2001 12:02 PM

Subject: link validation config help!

Hi,

I'm trying to use mnogosearch as a link validator for a large number of sites, but I ran into a serious problem.

Here's my configuration, in it's simplest form:

DBAddr ...

DeleteBad no

Index no

CheckOnly NoMatch Regex ^http://barracuda\.enhydra\.org/.*\.html$

Realm *

URL http://barracuda.enhydra.org/index.html

This works beautifully, checking the existance of links outside the barracuda.enhydra.org but not following. Except when indexer gets to this link, it follows it and starts indexing the other site.

<A href="http://www.sys-con.com/java/readerschoice2001/">

So now indexer is following through that page, all of its links, etc, and suddenly indexer is trying to check the whole world, ignoring the CheckOnly parameter.

I've tried different versions of the CheckOnly, with or without regex, splitting it into multiple lines, etc... nothing seems to help. And indexer doesn't ignore the CheckOnly for all sites, just a few.

Any ideas?

(I first tried a Server-based method,

DBAddr ..

DeleteBad no

Index no

Folllow site

Server http://barracuda.enhydra.org/index.html

but this does not validate links from this site to another.)

-Damon

Re: link validation config help!

Reply via email to