Thanks Erick, I''ll take a carefull study of that 2009/12/16 Erick Erickson <erickerick...@gmail.com>
> If your queries are still slow, make sure you're not measuring > the *first* query on a newly opened searcher. There are > other tips here that might be useful. These are general searching > tips complimentary to Robert's suggestions.. > > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>HTH > Erick > > 2009/12/15 Weiwei Wang <ww.wang...@gmail.com> > > > Thanks Robert, a lot is learned from you:-) > > > > On Wed, Dec 16, 2009 at 11:53 AM, Robert Muir <rcm...@gmail.com> wrote: > > > > > Hi, just one more thought for you. > > > > > > I think even more important than anything I said before, you should > > ensure > > > you implement reusableTokenStream in your analyzer. > > > this becomes a necessity if you are using expensive objects like this. > > > > > > 2009/12/15 Weiwei Wang <ww.wang...@gmail.com> > > > > > > > Finally, i make it run, however, it works so slow > > > > > > > > 2009/12/15 Weiwei Wang <ww.wang...@gmail.com> > > > > > > > > > got it, thanks, Robert > > > > > > > > > > > > > > > On Tue, Dec 15, 2009 at 10:19 PM, Robert Muir <rcm...@gmail.com> > > > wrote: > > > > > > > > > >> if you have lucene 2.9 or 3.0 source code, just run patch -p0 < > > > > >> /path/to/LUCENE-XXYY.patch from the lucene source code root > > > directory... > > > > >> it > > > > >> should create the necessary directory and files. > > > > >> then run 'ant' , in this case it should create a lucene-icu jar > file > > > in > > > > >> the > > > > >> build directory. > > > > >> > > > > >> the patch doesnt include the icu dependency itself so you need to > > get > > > > that > > > > >> jar file from www.icu-project.org and have it in your classpath > > also > > > > >> > > > > >> sorry for the trouble, hope to integrate some of this soon for a > > > future > > > > >> release. > > > > >> > > > > >> On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang < > ww.wang...@gmail.com> > > > > >> wrote: > > > > >> > > > > >> > Yes, i found the patch file LUCENE-1488.patch and there's no icu > > > > >> directory > > > > >> > in my dowloaded contrib directory. > > > > >> > > > > > >> > I'm a rookie guy using patch, i'm currently in the contrib dir, > > > could > > > > >> > anybody tell me how to execute this patch command to generate > the > > > > >> relevant > > > > >> > dir and souce files? > > > > >> > > > > > >> > On Tue, Dec 15, 2009 at 9:51 PM, Robert Muir <rcm...@gmail.com> > > > > wrote: > > > > >> > > > > > >> > > look at the latest patch file attached to the issue, it should > > > work > > > > >> with > > > > >> > > lucene 2.9 or greater (I think) > > > > >> > > > > > > >> > > 2009/12/15 Weiwei Wang <ww.wang...@gmail.com> > > > > >> > > > > > > >> > > > where can i find the source code? > > > > >> > > > > > > > >> > > > On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir < > > rcm...@gmail.com> > > > > >> wrote: > > > > >> > > > > > > > >> > > > > there is an icu transform tokenfilter in the patch here: > > > > >> > > > > http://issues.apache.org/jira/browse/LUCENE-1488 > > > > >> > > > > > > > > >> > > > > Transliterator pinyin = > > > > >> Transliterator.getInstance("Han-Latin"); > > > > >> > > > > Tokenizer tokenizer = new KeywordTokenizer(new > > > > >> > StringReader("中国")); > > > > >> > > > > ICUTransformFilter filter = new > > > ICUTransformFilter(tokenizer, > > > > >> > > pinyin); > > > > >> > > > > assertTokenStreamContents(filter, new String[] { "zhōng > > > guó" > > > > } > > > > >> ); > > > > >> > > > > > > > > >> > > > > note it will add tone marks and insert space between > > syllables > > > > by > > > > >> > > default > > > > >> > > > > if you do not want this, you need to do some cleanup. > > > > >> > > > > > > > > >> > > > > Transliterator pinyin = > > > > Transliterator.getInstance("Han-Latin; > > > > >> > NFD; > > > > >> > > > > [[:NonspacingMark:][:Space:]] Remove"); > > > > >> > > > > Tokenizer tokenizer = new KeywordTokenizer(new > > > > >> > StringReader("中国")); > > > > >> > > > > ICUTransformFilter filter = new > > > ICUTransformFilter(tokenizer, > > > > >> > > pinyin); > > > > >> > > > > assertTokenStreamContents(filter, new String[] { > > "zhongguo" > > > } > > > > >> ); > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > 2009/12/15 Weiwei Wang <ww.wang...@gmail.com> > > > > >> > > > > > > > > >> > > > > > Hi, guys, > > > > >> > > > > > I'm implementing a search engine based on Lucene for > > > > >> Chinese. > > > > >> > So > > > > >> > > I > > > > >> > > > > want > > > > >> > > > > > to support pinyin search as Google China do. > > > > >> > > > > > > > > > >> > > > > > e.g. > > > > >> > > > > > “中国” means Chinese in English > > > > >> > > > > > this word's pinyin input is "zhongguo" > > > > >> > > > > > The feature i want to implement is when user type > zhongguo > > > the > > > > >> > > results > > > > >> > > > > will > > > > >> > > > > > include documents containing "中国" or even Chinese > > > > >> > > > > > > > > > >> > > > > > Anybody here know how to achieve this? > > > > >> > > > > > > > > > >> > > > > > -- > > > > >> > > > > > Weiwei Wang > > > > >> > > > > > Alex Wang > > > > >> > > > > > 王巍巍 > > > > >> > > > > > Room 403, Mengmin Wei Building > > > > >> > > > > > Computer Science Department > > > > >> > > > > > Gulou Campus of Nanjing University > > > > >> > > > > > Nanjing, P.R.China, 210093 > > > > >> > > > > > > > > > >> > > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > -- > > > > >> > > > > Robert Muir > > > > >> > > > > rcm...@gmail.com > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > -- > > > > >> > > > Weiwei Wang > > > > >> > > > Alex Wang > > > > >> > > > 王巍巍 > > > > >> > > > Room 403, Mengmin Wei Building > > > > >> > > > Computer Science Department > > > > >> > > > Gulou Campus of Nanjing University > > > > >> > > > Nanjing, P.R.China, 210093 > > > > >> > > > > > > > >> > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > -- > > > > >> > > Robert Muir > > > > >> > > rcm...@gmail.com > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Weiwei Wang > > > > >> > Alex Wang > > > > >> > 王巍巍 > > > > >> > Room 403, Mengmin Wei Building > > > > >> > Computer Science Department > > > > >> > Gulou Campus of Nanjing University > > > > >> > Nanjing, P.R.China, 210093 > > > > >> > > > > > >> > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Robert Muir > > > > >> rcm...@gmail.com > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > Weiwei Wang > > > > > Alex Wang > > > > > 王巍巍 > > > > > Room 403, Mengmin Wei Building > > > > > Computer Science Department > > > > > Gulou Campus of Nanjing University > > > > > Nanjing, P.R.China, 210093 > > > > > > > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > > > > > > > > > > > > > > > > > > -- > > > > Weiwei Wang > > > > Alex Wang > > > > 王巍巍 > > > > Room 403, Mengmin Wei Building > > > > Computer Science Department > > > > Gulou Campus of Nanjing University > > > > Nanjing, P.R.China, 210093 > > > > > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > > > > > > > > > > > > > > -- > > > Robert Muir > > > rcm...@gmail.com > > > > > > > > > > > -- > > Weiwei Wang > > Alex Wang > > 王巍巍 > > Room 403, Mengmin Wei Building > > Computer Science Department > > Gulou Campus of Nanjing University > > Nanjing, P.R.China, 210093 > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang > > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang