Developers,

I found a bug in v3.1.6, and probably in all future versions too. Here
it is:

If you enter a "restrict" value in the URL for htsearch (not in the
config file), it will be compared UNENCODED to the ENCODED URLs in
htdig's database.

For example, the following query:

http://www.mvpix.com/cgi-bin/perl/search?words=%2A&restrict=/photos/021/Netherland%20Antilles/Bonaire/Places/Urban/&method=and&sort=date&format=short

Will never match:

http://www.mvpix.com/photos/021/Netherland%20Antilles/Bonaire/Places/Urban/Industry/20030511-062204.jpg.html

I've fixed htsearch temporarily with the following code, but some
thought probably should be given on how to address this. I suspect the
solution is to compare both strings in their unencoded form.

My snippet:

[EMAIL PROTECTED]:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig 
htsearch/htsearch.cc
23a24
> #include "URL.h"
169,170c170,174
<     if (input.exists("restrict"))
<   config.Add("restrict", input["restrict"]);
---
>   if (input.exists("restrict")) {
>       String restrict_url = input["restrict"];
>       encodeURL(restrict_url, "-_./");
>       config.Add("restrict", restrict_url);
>   }
[EMAIL PROTECTED]:/mnt/lan/src/htdig-3.1.6$ 

Another side-effect of using 'config.Add("restrict",
input["restrict"]);' un-encoded is that any spaces will be treated as
ORs later on by this line 'urllist.Create(config["restrict"], "|
\t\r\n\001");'.

BTW, this same bug affects the "exclude" value too.

Thanks,
js.

On Sun, Nov 16, 2003 at 11:15:09PM -0500, Jean-Sebastien Morisset wrote:
> Guys,
> 
> Shouldn't the following change to v3.1.6 work?
> 
> ---START---
> 
> [EMAIL PROTECTED]:/mnt/lan/src/htdig-3.1.6$ diff htsearch/htsearch.cc-orig 
> htsearch/htsearch.cc
> 220c220
> <         urllist.Create(config["restrict"], "| \t\r\n\001");
> ---
>>         urllist.Create(config["restrict"], "|\t\r\n\001");
> 
> ---END---
> 
> It seems to have fixed the OR problem, but now I'm not getting any
> matches. I've added "<!--RESTRICT:$(RESTRICT)-->" to the nomatch.html
> file, and here is what it gives me:
> 
> <!--RESTRICT:/photos/021/Netherland Antilles-->
> 
> So it appears the space made it in there, but I don't understand why
> htsearch isn't matching the URLs with it.
> 
> Any ideas? I've tried a whole bunch of things, but nothing has worked so
> far...
> 
> BTW, here's a snippet from rundig showing the URLs it should match:
> 
> 307:307:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Flying/:
>  **-*-*******-*****-********- size = 6914
> 308:308:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Transportation/Automobiles/:
>  **-*-*******-*****-********- size = 6934
> 309:309:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Industrial/:
>  **-*-*******-*****-********- size = 6895
> 310:310:4:http://www.mvpix.com/photos/011/Netherland%20Antilles/Bonaire/Objects/Still%20Life/:
>  **-*-*******-*****-********- size = 6898
> 
> Thanks,
> js.
> 
> On Sun, Nov 16, 2003 at 05:10:19PM -0500, Jean-Sebastien Morisset wrote:
>> Hi,
>> 
>> I'm trying to use a restrict value with spaces - for example:
>> 
>> restrict=/photos/021/Netherland%20Antilles/Bonaire/
>> 
>> Unfortunately, htdig v3.1.6 reads this as "/photos/021/Netherland" OR
>> "Antilles/Bonaire/" when I would like it to read it as a single string.
>> Is there a way to have it treat spaces as part of the string?
> 
> -------------------------------------------------------
> This SF. Net email is sponsored by: GoToMyPC
> GoToMyPC is the fast, easy and secure way to access your computer from
> any Web browser or wireless device. Click here to Try it Free!
> https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
-- 
Jean-Sebastien Morisset, Sr. UNIX Administrator <[EMAIL PROTECTED]>
Personal Home Page <http://jsmoriss.mvlan.net/>
JS & Melanie's Homebrewery <http://brewery.mvlan.net/>
Underwater and Travel Photographs <http://www.mvpix.com/>


-------------------------------------------------------
This SF. Net email is sponsored by: GoToMyPC
GoToMyPC is the fast, easy and secure way to access your computer from
any Web browser or wireless device. Click here to Try it Free!
https://www.gotomypc.com/tr/OSDN/AW/Q4_2003/t/g22lp?Target=mm/g22lp.tmpl
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to