Hello,

I am running into a problem for which I haven't been able to find a
solution  after a few days of digging around.  My apologies if a solution
to this problem has already been posted in the past.  I searched the
mailing list and found a problem that appeared to be similar to the one I
am experiencing but the solution was vague.  Anyway here is the scenario:

I have site that I am indexing, all of the pages of which are in PHP. 
Htdig manages to dig the entire site, and I can do searches on what was
indexed, except certain links to pages containing GET variables in the URL
are being indexed incorrectly.  All of the pages for the site are in the
root directory for the site.

For example:

http://www.mysite.com/somepage.php?somevar=12

gets indexed as:

http://www.mysite.com/?somevar=12


The somepage.php part of the URL is completely taken out... Then when I do
a search that should have returned the
www.mysite.com/somepage.php?somevar=12,  instead the search result
returned is www.mysite.com/?somevar=12.

The only page for which GET variables are being indexed correctly is the
main page (e.g.  www.mysite.com?news=34).  But it's interesting because
htdig puts a trailing '/' on the URL, before the '?' character, so it
indexes the page as www.mysite.com/?news=34.


My config file (BTW I have changed site name, but you get the idea):
----
database_dir:           /www/mysite.com/search/db
start_url:              http://mysite.com/
limit_urls_to:          ${start_url}
exclude_urls:           /includes/ /search/
bad_extensions:         .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif
.jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css .png
maintainer:             [EMAIL PROTECTED]
max_head_length:        100000
max_doc_size:           150000
no_excerpt_show_top:    true
search_algorithm:       exact:2 endings:0.1 prefix:0.1 substring:0.1
template_map:   mysite mysite /www/mysite.com/search/results-template.html
search_results_header:  /www/mysite.com/search/results-header.html
nothing_found_file:     /www/mysite.com/search/results-nomatch.html
syntax_error_file:      /www/mysite.com/search/results-syntaxerror.html
valid_punctuation:      .-_/!#$%^&*'��"
----

Any ideas as to how I can fix this would be greatly appeciated.

Regards,
Sam Razi



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to