I've been thinking a little more about this problem, and since it seems to
consist of two parts, I wonder if it can be solved by splitting the dig into
two parts, and then merging the databases.

If you use:

limit_urls_to:  DO_TOPIC \
                               DO_ROOT \
                               DO_COMMUNITY

in one config, then my understanding of your problem is that the only 'GOOD'
URL that you will  exclude is   http://example.org/index.html 

If you then have:

limit_urls_to:  ${start_url}
Max_docs: 1  (or something similar)

in a second config then you should be able to get the missing document into a
second database, and merge it into the first.
The only problem that I can see then is that on many systems you may not be
able to get a good index this way, since the obvious start point is not
accessible in the main dig. This may then be overcome by feeding a URL list
generated by  the 'short dig' (config 2) into the 'full dig' (config 1)

Mike


> On Mon, 10 Jan 2005, Dan Langille wrote:
> 
> > How can I use that on limit_urls_to?  I've been trying this:
> >
> > limit_urls_to:  ${start_url}*DO_TOPIC|DO_ROOT|DO_COMMUNITY*
> >
> > There are addiitonal restrictions, but once I get a 
> starting point, I 
> > think it'll all fall into place.
> >
> > A few example of what we want to do:
> >
> >  http://example.org/index.html OK  
> http://example.org/index.html?ID=4  
> > BAD  
> http://example.org/index.html?ID=4&DO_TOPIC OK
> 


********************************************************************

This email may contain information which is privileged or confidential. If you 
are not the intended recipient of this email, please notify the sender 
immediately and delete it without reading, copying, storing, forwarding or 
disclosing its contents to any other person
Thank you

Check us out at http://www.bt.com/consulting

********************************************************************



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to