Hello!
I attach the patch which fixes this.
Thanks for reporting this!
prochor wrote:
>
> Author: prochor
> Email: [EMAIL PROTECTED]
> Message:
> I have to use 3.2 because I need to index utf-8 site.
> Try this URL
>http://newweb2.nextra.cz/cgi-bin/search.cgi?q=Nanynka&ps=10&o=0&m=all&wf=22210&ul=
> ... thes leads to search result page I have two questions about:
>
> 1) Wht does indexer index newweb2.nextra.cz/ and newweb2.nextra.cz/index.html as two
>different files when this is only different URL to the same resource .. and ..
> 2) How it is possible that the first result line displays BAD czech characters and
>the second line (the SAME file) displays these chars correctly ????
> ... the same file indexed twice and with different charset ??
>
> Thanks
>
> Reply: <http://www.mnogosearch.org/board/message.php?id=2741>
>
> ___________________________________________
> If you want to unsubscribe send "unsubscribe general"
> to [EMAIL PROTECTED]
Index: indexer.c
===================================================================
RCS file: /usr/src/CVS/mnogosearch32/src/indexer.c,v
retrieving revision 1.29
diff -u -r1.29 indexer.c
--- indexer.c 2001/08/07 18:59:03 1.29
+++ indexer.c 2001/08/08 13:33:55
@@ -1170,12 +1170,18 @@
char rurl[UDM_URLSIZE];
char * surl=Doc->url;
time_t lm=Doc->last_mod_time;
+ char * scharset;
sprintf(rurl,"%s://%s/robots.txt",CurURL.schema,CurURL.hostinfo);
Doc->url=rurl;
Doc->last_mod_time=0;
+ scharset=Doc->charset;
+ Doc->charset=NULL;
+
result=UdmIndexURL(Indexer,Doc,index_flags);
+ UDM_FREE(Doc->charset);
+ Doc->charset=scharset;
Doc->url=surl;
Doc->last_mod_time=lm;
}