Re: Webboard: ver 3.2 and utf-8 site problem

Alexander Barkov Wed, 08 Aug 2001 06:19:46 -0700

  Hello!

I attach the patch which fixes this.

Thanks for reporting this!

prochor wrote:
> 
> Author: prochor
> Email: [EMAIL PROTECTED]
> Message:
> I have to use 3.2 because I need to index utf-8 site.
> Try this URL 
>http://newweb2.nextra.cz/cgi-bin/search.cgi?q=Nanynka&ps=10&o=0&m=all&wf=22210&ul=
> ... thes leads to search result page I have two questions about:
> 
> 1) Wht does indexer index newweb2.nextra.cz/ and newweb2.nextra.cz/index.html as two 
>different files when this is only different URL to the same resource .. and ..
> 2) How it is possible that the first result line displays BAD czech characters and 
>the second line (the SAME file) displays these chars correctly ????
> ... the same file indexed twice and with different charset ??
> 
> Thanks
> 
> Reply: <http://www.mnogosearch.org/board/message.php?id=2741>
> 
> ___________________________________________
> If you want to unsubscribe send "unsubscribe general"
> to [EMAIL PROTECTED]

Index: indexer.c
===================================================================
RCS file: /usr/src/CVS/mnogosearch32/src/indexer.c,v
retrieving revision 1.29
diff -u -r1.29 indexer.c
--- indexer.c   2001/08/07 18:59:03     1.29
+++ indexer.c   2001/08/08 13:33:55
@@ -1170,12 +1170,18 @@
                        char rurl[UDM_URLSIZE];
                        char * surl=Doc->url;
                        time_t lm=Doc->last_mod_time;
+                       char * scharset;
 
                        
sprintf(rurl,"%s://%s/robots.txt",CurURL.schema,CurURL.hostinfo);
                        Doc->url=rurl;
                        Doc->last_mod_time=0;
+                       scharset=Doc->charset;
+                       Doc->charset=NULL;
+                       
                        result=UdmIndexURL(Indexer,Doc,index_flags);
                        
+                       UDM_FREE(Doc->charset);
+                       Doc->charset=scharset;
                        Doc->url=surl;
                        Doc->last_mod_time=lm;
                }

Re: Webboard: ver 3.2 and utf-8 site problem

Reply via email to