The goal is not to EXCLUDE the entire document from being indexed;
rather, just the query string (anything after the ?).  I will take
your advice and look into URL::URL() and URL::parse()

At 12:30 PM 3/6/00 -0600, Gilles Detillieux wrote:
>According to Patrick:
>> Could someone give me some insight as to where I can begin
>> to write a patch that will allow the ability to "remove all
>> query string (anything after the '?') variables"?
>> 
>> My initial guess is within Retriever.cc, in the Retriever::Initial
>> function, immediately after:
>> 
>> url = u.get();
>> 
>> ..then, if a certain config setting is true, perform something 
>> similar to the Perl equivalent of:
>> 
>> url =~ s/\?.*$//;
>> 
>> Any help is appreciated.
>
>Retriever::Initial only handles the initial URLs, i.e. in start_url
>or URLs already in the database for an update htdig.  It won't handle
>newly followed href's.  To get them all, maybe URL.cc is the best place
>for this.  It already strips off the "#sectionname" portion of an URL,
>in URL::URL() and URL::parse().
>
>You may want to take a step back, though, and ask yourself why you want to
>to this.  If your goal is simply to avoid indexing any URL with a query
>string, you can just add a ? to the exclude_urls attribute definition
>in your htdig.conf.  Stripping off the query string is a pretty drastic
>step, as you'll still end up indexing all your CGI scripts (unless
>excluded by exclude_urls), but calling them all without a query string.
>It will also prevent you from being able to index any "virtual tree"
>of documents accessed by a query string, if you ever need to do this.
>
>-- 
>Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
>Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
>Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
>Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
>
>------------------------------------
>To unsubscribe from the htdig3-dev mailing list, send a message to
>[EMAIL PROTECTED] 
>You will receive a message to confirm this. 
>
>
>

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to