According to Bodo Bauer:
> Gilles Detillieux ([EMAIL PROTECTED]) wrote:
> > According to Bodo Bauer:
> > > I try to set up htdig for our website, to index our mailinglist
> > > archives. Unfortunatly it seems to ignore exaclty these links.
> > > 
> > > The Arcives are stored in directories containing a colon (like 1999:Feb)
> > > for february 1999. If I start within such a subdir it works 
> > > 
> > > start_url:    http://www.suse.com/Mailinglists/suse-informix/1999:Feb/
> > > 
> > > but 
> > > 
> > > start_url:    http://www.suse.com/Mailinglists/suse-informix
> > > 
> > > doesn't see this subdir. The index file there however contians
> > > all the links...
> > > 
> > > Any idea?
> > 
> > It contains all the links, but the links are not complete.  They're all
> > missing their closing </a> tag.  htdig doesn't process <a href=...> tags
> > until it finds the closing </a> tag, so these are just getting ignored.
> 
> Thanks a lot for finding this bug. How emmbarrising, could have seen this myself.
> I looked about a hunderd times on the HTTP code yesterday looking for some
> kind of error. I fixed the script generating these pages and now it works!
> 
> Sorry for bothering you...

Not at all.  It was one that was hard to spot, and htdig didn't give any
error messages to point the way.  Here's a patch to htdig/HTML.cc that
should make it handle this situation better in the future...

--- htdig/HTML.cc.hrefunterm    Wed Mar 17 11:01:08 1999
+++ htdig/HTML.cc       Wed Mar 17 14:06:37 1999
@@ -465,6 +465,16 @@ HTML::do_tag(Retriever &retriever, Strin
                                q++;
                            *q = '\0';
                        }
+                       if (in_ref)
+                       {
+                           if (debug > 1)
+                               cout << "Terminating previous <a href=...> tag,"
+                                    << " which didn't have a closing </a> tag."
+                                    << endl;
+                           if (dofollow)
+                               retriever.got_href(*href, description);
+                           in_ref = 0;
+                       }
                        delete href;
                        href = new URL(position, *base);
                        in_ref = 1;


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to