On Wed, 17 Oct 2001, Gilles Detillieux wrote:

> Date: Wed, 17 Oct 2001 15:35:53 -0500 (CDT)
> From: Gilles Detillieux <[EMAIL PROTECTED]>
> To: Joe R. Jah <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig-dev] Re: URL Rewrite patch for 3.1.6 snapshots
> 
> > I found 82 links from one document with META ROBOT: Noindex tag;)  I could
> > not find an efficient way of hunting down the other 138 links that were
> > unaccounted for in two 20 meg+ files; however, I must assume that they are
> > some sort of duplicates;-/
> 
> Hmm.  Too bad we couldn't get something more definitive.  I'm fairly
> confident that the changes to the HTML parser didn't break anything, but
> I'd feel much more comfortable if we could explain the missing files you
> discovered rather than just assuming it's OK.  If I recall, there were
> 88 URLs with doubled slashes that were eliminated in an earlier test,
> but that still leaves around 50 URLs unaccounted for.
> 
> If there's any way you can take a snapshot of your site, or a few major
> subdirectories, and duplicate them somewhere else where they won't get
> modified, it would be a big help in getting conclusive results.  If you
> index the exact same files with 3.1.5 and 3.1.6, you should be able to
> diff the output of htdig -vvv from both, and pinpoint exactly where the
> differences are happening.  I know this is asking a lot, but it would be
> a shame to release 3.1.6 after all the work that's gone into it, only to
> discover afterward that it introduced a serious bug.

Sorry it took such a long time to respond, but I have been very busy
lately.  It is not easy to prove a negative; however, I have tried a few
times to make 3.1.6 miss indexing files in stable snapshots of my site
without success;)

Here is a comparison of the latest 3.1.6 snapshot on a snapshot of my site
-- 163 HTML-only documents -- with 3.1.6-072901:

_______3.1.6-072901 + Armstrong patch + ssl.4_______
htdig:   Start digging: Sun Nov 11 18:15:43 PST 2001
htmerge: Start merging: Sun Nov 11 18:16:16 PST 2001  33 seconds
htmerge: Total word count: 13171
htmerge: Total documents: 163
htmerge: Total doc db size (in K): 1888
-------------------------8<-------------------------
__________3.1.6-111101 + ssl.5 + FAQ#5.14___________
htdig:   Start digging: Sun Nov 11 18:19:19 PST 2001
htmerge: Start merging: Sun Nov 11 18:20:58 PST 2001  99 seconds
htmerge: Total word count: 13171
htmerge: Total documents: 163
htmerge: Total doc db size (in K): 1888
-------------------------8<-------------------------
CPU:    350 MHz Pentium 
RAM:    384 Megs
OS:     BSDi-4.2

They both index the exact number of documents; this is as conclusive a
result as I can produce.  The only difference is the the time they take.

Incidentally, ssl.4 fails to apply to the latest snapshot because of the
recent changes to Connection.cc.  I have modified the patch to apply
cleanly to the latest snapshot of 3.1.6:

        ftp://ftp.ccsf.org/htdig-patches/3.1.6/ssl.5

Regards,

Joe
-- 
     _/   _/_/_/       _/              ____________    __o
     _/   _/   _/      _/         ______________     _-\<,_
 _/  _/   _/_/_/   _/  _/                     ......(_)/ (_)
  _/_/ oe _/   _/.  _/_/ ah        [EMAIL PROTECTED]


_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to