I have a large site containing classified ads.  They are the only pages I 
want to index, and all start with a number.

The base site is http://www.citynews.com, whic contains no ads; they are 
all on hosts named after cities, like "http://dallas.citynews.com";

I created an index of all the ads, and put them in a file called 
"http://www.citynews.com/adlist/index.html";, and am running htdig 3.2.0b4 
against it.


My config contains:

start_url:              http://www.citynews.com/adlist/index.html
limit_urls_to:    [citynews.com/[0-9]]

The problem I'm experiencing is that in addition to correctly indexing the 
files listed as links in the adlist file, the adlist itseld is also being 
indexed, and appears in all search results.

Is there a way I can index the contents of this file, but not the file itself?

I tried adding
exclude_urls:   /adlist/index.html

but the file is still showing up in the results.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to