I have a large site containing classified ads. They are the only pages I want to index, and all start with a number.
The base site is http://www.citynews.com, whic contains no ads; they are all on hosts named after cities, like "http://dallas.citynews.com" I created an index of all the ads, and put them in a file called "http://www.citynews.com/adlist/index.html", and am running htdig 3.2.0b4 against it. My config contains: start_url: http://www.citynews.com/adlist/index.html limit_urls_to: [citynews.com/[0-9]] The problem I'm experiencing is that in addition to correctly indexing the files listed as links in the adlist file, the adlist itseld is also being indexed, and appears in all search results. Is there a way I can index the contents of this file, but not the file itself? I tried adding exclude_urls: /adlist/index.html but the file is still showing up in the results. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

