- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Name: HonestQiao
Subject: Re: Hot to segment for UTF-8 of China when index on DB full text 
search?

I reinstall dps, but it can't segm Chinese.

www#wget -d http://www.dataparksearch.org/add-on/mandarin.freq.gz
www#gzip -d mandarin.freq.gz
www#wget -d http://www.dataparksearch.org/dpsearch-4.45-28012007.tar.gz

www#tar xzvf dpsearch-4.45-28012007.tar.gz
www#cd dpsearch-4.45-28012007
www#./configure --prefix=/usr/local/dpsearch --with-extra-charsets=chinese 
--with-mysql
www#make && make install
www#cp ../mandarin.freq /usr/local/dpsearch/etc/

www# diff indexer.conf indexer.conf-dist 
68,69c68,69
< #DBAddr               mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache
< DBAddr                mysql://search:[EMAIL PROTECTED]/search/?dbmode=single
---
> DBAddr                mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache
> 
164c164
< LocalCharset UTF-8
---
> #LocalCharset UTF-8
291d290
< LoadChineseList GB2312 mandarin.freq
706c705
< DefaultLang zh
---
> #DefaultLang en
837c836
< RemoteCharset UTF-8
---
> #RemoteCharset iso-8859-1
1027,1041d1025
< 
< HTDBAddr mysql://search:[EMAIL PROTECTED]/db_test_com/
< HTDBLimit 512 
< 
< Limit t:tag
< Tag works
< HTDBList "SELECT SQL_NO_CACHE id FROM article"
< HTDBDoc "SELECT SQL_NO_CACHE concat(\ 
< 'HTTP/1.0 200 OK\\r\\n',\ 
< 'Content-type: text/html\\r\\n',\ 
< 'Last-Modified: ',FROM_UNIXTIME(a.lasttime,'%a, %d %b %Y %H:%i:%s 
GMT'),'\\r\\n',\ 
< '\\r\\n',\
< '<html><head><title>',b.body,'</title></head><body>TAG:',a.tag,' 
UID:',a.uid,' WORD:',b.body,'</body></html>') \
< FROM article as a LEFT JOIN content as b USING(id) WHERE a.id='$2'"
< Server htdb:/works/
\ No newline at end of file
www# 

www# diff search.htm search.htm-dist 
17,20c17
< #DBAddr       mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache
< DBAddr        mysql://search:[EMAIL PROTECTED]/search/?dbmode=single
< 
< LoadChineseList GB2312 mandarin.freq
---
> DBAddr        mysql://foo:[EMAIL PROTECTED]/search/?dbmode=cache
32,33c29,30
< LocalCharset   UTF-8
< BrowserCharset UTF-8
---
> LocalCharset   iso-8859-1
> BrowserCharset iso-8859-1
www# 

www#cat langmap.conf
LangMapFile langmap/zh.utf8.lm


Indexer can get data.
But in table dict , word wasn't be segment.
And I use search.cgi , If I dont use "Search for:Substring", the search result 
return nothing.

And my msn is [EMAIL PROTECTED]
Can you helo online?
Thanks.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -

Read the full topic here:
http://www.dataparksearch.org/cgi-bin/simpleforum.cgi?fid=02;topic_id=1170316385

Reply via email to