RE: [htdig] Help in url_part_aliases ..

Shivaji_Apte Thu, 17 Jan 2002 06:32:45 -0800

Ok. I didn't mean that the software SHOULD WORK the way I think.
I was just elaboratingmy thoughts. I am sorry if it appeared otherwise.


Now coming to my problem:

After reading the description given at
http://www.htdig.org/attrs.html#url_part_aliases
I have the following doubts:

1. To re-write URLs, you should have different 'from' strings in
url_part_aliases for htdig and htsearch but same 'to' strings. This much
is fine. Now, what should be this 'to' string ? 
   The documentation says: "Strings that are normally incorrect in URLs
or very seldom used, should be used as to-strings .."

   Does this mean I can for example use 
        http://my.server.com/arg1/*1/arg2/*2.htm as my 'from' string and

        http://zoo/arg1/*1/arg2/*2.htm as my 'to' string for 'htdig' and
        
        http://my.server.com/cgi-bin/handler?arg1=*1&arg2=*2
        http://zoo/arg1/*1/arg2/*2.htm as my 'to' string for ht search ?

   What is the best practice reg 'to' string if re-writing URLs is the
aim ?

2. In the example provided in the documentation:

                ##(htdig)
           url_part_aliases:  
                        http://search.example.com/~htdig *site \
                        http://www.htdig.org/this/ *1 \
                        .html *2 
                ## (htsearch)
           url_part_aliases: 
                        http://www.htdig.org/ *site \
                        http://www.htdig.org/that/ *1 \
                        .htm *2 

        how are the arguments *1 and *2 working ? 
        

Thanks in advance,
Shivaji
        
> -----Original Message-----
> From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, January 16, 2002 11:59 PM
> To: Shivaji_Apte
> Cc: htdig
> Subject: Re: [htdig] Help in url_part_aliases ..
> 
> >>> lines snipped >>>

In the first stage, htdig encodes the URLs as they
go into the database, by using the pairs in url_part_aliases going from
left to right.  In the second stage, htsearch decodes the encoded URLs
taken from the database, by using the pairs in url_part_aliases going
from
right to left.  If you have the same value for url_part_aliases in htdig
and htsearch, you end up with the same URLs in the end.  If you modify
the
first string (the from string) in the pairs listed in url_part_aliases
for htsearch, then when htsearch decodes the URLs it ends up rewriting
part of them.

While you might think that if you don't use url_part_aliases in htdig,
then you can use it in htsearch to alter unencoded URLs, the reality is
that if you don't encode parts of URLs using url_part_aliases, they
still
get encoded automatically by the common_url_parts attribute.  This helps
to reduce the size of your databases.  So, trying to use
url_part_aliases
only in htsearch doesn't work because there are no unencoded URLs in the
database, so the right hand strings in the pairs you define won't match
anything.

While the documentation for url_part_aliases may be limited, it does
quite clearly state that you need separate definitions of this attribute
for htdig and htsearch.  So does FAQ 4.7.  You chose to ignore that and
instead try using it the way you thought it ought to work.  How many
examples would it have taken to convince you otherwise?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

RE: [htdig] Help in url_part_aliases ..

Reply via email to