Run  indexer -a -s0



-------------------------
I've been doing a bit of poking around, and have noticed a couple of odd
behaviours.  I'm
using the most recent version from CVS but I've experienced very similar
behaviour from 3.1.9.
 
First, if I do this:
# indexer foo
(after a few URLS, ^C to break)
# indexer foo
... then indexer continues with the rest of the non-indexed URLs, as I
would expect.
 
But, if I do this:
# indexer foo -n 10
# indexer foo
Indexer[13686]: indexer from mnogosearch-3.1.12/MySQL started with 'foo'
Indexer[13686]: [1] Done (0 seconds)
# indexer foo -S
 
          Database statistics
 
    Status    Expired      Total
   -----------------------------
         0          0         48 Not indexed yet
       200          0         10 OK
   -----------------------------
     Total          0         58
 
... I would expect it to continue with the non-indexed URLs, but it
assumes they're all up to
date.  Is this expected behaviour or a bug?
 
Also, even though I have "Index no" in my config, I see these lines
occasionally in my mysql
query log:
 
...
                     33 Query      INSERT INTO url
(url,referrer,hops,crc32,last_index_time,next_index_time,status,tag,category)
VALUES
('http://barracuda.enhydra.org/media/header/enhydraFtr.gif',1,1,0,983780385,983780385,0,'','')
                     33 Query      DELETE FROM dict WHERE url_id=1
                     33 Query      UPDATE url SET
status=200,last_mod_time=983780385,next_index_time=984385185,tag='',txt='...........................
 
...........................  ........................... 
........................... 
........................... 
...........................                            Barracuda
Project  About Barracuda  Project Mail Lists',title='The home of
Barracuda at
Enhydra.org',content_type='text/html',docsize=26706,keywords='j2ee
enhydra lutris java
application server xml open source JDDI XMLC XML Compiler wireless chtml
xhtml wml
J2EE',description='',crc32=1453362912,lang='',category='' WHERE rec_id=1
 
... looks like it's storing index info?  Is indexer somehow losing its
configuration, or do I
have something mis-configured?
 
cheers,
Damon
 

     ----- Original Message ----- 
     From: Damon Tkoch 
     To: [EMAIL PROTECTED] 
     Sent: Monday, March 05, 2001 12:02 PM
     Subject: link validation config help!

     Hi, 
     I'm trying to use mnogosearch as a link validator for a large
number of sites, but I
     ran into a serious problem.
      
     Here's my configuration, in it's simplest form:
      
     DBAddr ...
     DeleteBad no
     Index no
     CheckOnly NoMatch Regex ^http://barracuda\.enhydra\.org/.*\.html$
     Realm *
     URL http://barracuda.enhydra.org/index.html
      
     This works beautifully, checking the existance of links outside the
     barracuda.enhydra.org but not following.  Except when indexer gets
to this link, it
     follows it and starts indexing the other site.
      
     <A href="http://www.sys-con.com/java/readerschoice2001/">
      
     So now indexer is following through that page, all of its links,
etc, and suddenly
     indexer is trying to check the whole world, ignoring the CheckOnly
parameter.
      
     I've tried different versions of the CheckOnly, with or without
regex, splitting it
     into multiple lines, etc... nothing seems to help.  And indexer
doesn't ignore the
     CheckOnly for all sites, just a few.
      
     Any ideas?
      
     (I first tried a Server-based method,
      
     DBAddr ..
     DeleteBad no
     Index no
     Folllow site
     Server http://barracuda.enhydra.org/index.html
      
     but this does not validate links from this site to another.)
      
     -Damon
___________________________________________
If you want to unsubscribe send "unsubscribe general"
to [EMAIL PROTECTED]

Reply via email to