According to M. Schulz:
> o.k., i build an index with ht://dig for e.g. 2 sites:
> 
>       http:///www.abc.org
> and
>       http://www.def.org
> 
> At the second site there�s an url e.g.
> 
>       http://www.def.org/test/index.html
> 
> Question: Is it possible to remove exactly only that
> url from the index?

That depends.  If you want to remove a URL from the index without needing
to reindex the site, there isn't currently a way to do this easily.
A kludgy way, in 3.1.x, would be to find out what the document ID for
this URL is, and then insert a record into db.wordlist telling htmerge
to delete it.  E.g., if its DocID is 123, then add the record:

-123

to db.wordlist, and rerun htmerge.

On the other hand, if you want to exclude this URL from future reindexing
runs, you should add it to exclude_urls.  However, be aware that the
name "index.html" is usually stripped off of URLs, and I believe this
is done before checking against exclude_urls.  If you put that URL in
exclude_urls, without the index.html part, it would tell htdig to exclude
everything under the test/ subdirectory, which may not be what you want.
You may have better luck with meta tags right in the document.

See http://www.htdig.org/attrs.html#exclude_urls
    http://www.htdig.org/attrs.html#remove_default_doc
and http://www.htdig.org/FAQ.html#q4.15

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to