Re: How to do alias(Pinyin) search in Lucene

Erick Erickson Tue, 15 Dec 2009 06:23:09 -0800

If you're using an IDE, there should be an "apply patch" somewhere. In
Eclipse, you right-click on the project>>team>>apply patch.


In IntelliJ, it's something like Version Control>>(subversion???)>>apply
patch....

Or do as Robert suggests from the command line...

HTH
Erick

On Tue, Dec 15, 2009 at 9:13 AM, Weiwei Wang <[email protected]> wrote:

> Yes, i found the patch file LUCENE-1488.patch and there's no icu directory
> in my dowloaded contrib directory.
>
> I'm a rookie guy using patch, i'm currently in the contrib dir, could
> anybody tell me how to execute this patch command to generate the relevant
> dir and souce files?
>
> On Tue, Dec 15, 2009 at 9:51 PM, Robert Muir <[email protected]> wrote:
>
> > look at the latest patch file attached to the issue, it should work with
> > lucene 2.9 or greater (I think)
> >
> > 2009/12/15 Weiwei Wang <[email protected]>
> >
> > > where can i find the source code?
> > >
> > > On Tue, Dec 15, 2009 at 9:40 PM, Robert Muir <[email protected]> wrote:
> > >
> > > > there is an icu transform tokenfilter in the patch here:
> > > > http://issues.apache.org/jira/browse/LUCENE-1488
> > > >
> > > >    Transliterator pinyin = Transliterator.getInstance("Han-Latin");
> > > >    Tokenizer tokenizer = new KeywordTokenizer(new
> StringReader("中国"));
> > > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
> > pinyin);
> > > >    assertTokenStreamContents(filter, new String[] { "zhōng guó" } );
> > > >
> > > > note it will add tone marks and insert space between syllables by
> > default
> > > > if you do not want this, you need to do some cleanup.
> > > >
> > > >    Transliterator pinyin = Transliterator.getInstance("Han-Latin;
> NFD;
> > > > [[:NonspacingMark:][:Space:]] Remove");
> > > >    Tokenizer tokenizer = new KeywordTokenizer(new
> StringReader("中国"));
> > > >    ICUTransformFilter filter = new ICUTransformFilter(tokenizer,
> > pinyin);
> > > >    assertTokenStreamContents(filter, new String[] { "zhongguo" } );
> > > >
> > > >
> > > > 2009/12/15 Weiwei Wang <[email protected]>
> > > >
> > > > > Hi, guys,
> > > > >     I'm implementing a search engine based on Lucene for Chinese.
> So
> > I
> > > > want
> > > > > to support pinyin search as Google China do.
> > > > >
> > > > > e.g.
> > > > >    “中国”  means Chinese in English
> > > > >    this word's pinyin input is "zhongguo"
> > > > > The feature i want to implement is when user type zhongguo the
> > results
> > > > will
> > > > > include documents containing "中国" or even Chinese
> > > > >
> > > > > Anybody here know how to achieve this?
> > > > >
> > > > > --
> > > > > Weiwei Wang
> > > > > Alex Wang
> > > > > 王巍巍
> > > > > Room 403, Mengmin Wei Building
> > > > > Computer Science Department
> > > > > Gulou Campus of Nanjing University
> > > > > Nanjing, P.R.China, 210093
> > > > >
> > > > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Robert Muir
> > > > [email protected]
> > > >
> > >
> > >
> > >
> > > --
> > > Weiwei Wang
> > > Alex Wang
> > > 王巍巍
> > > Room 403, Mengmin Wei Building
> > > Computer Science Department
> > > Gulou Campus of Nanjing University
> > > Nanjing, P.R.China, 210093
> > >
> > > Homepage: http://cs.nju.edu.cn/rl/weiweiwang
> > >
> >
> >
> >
> > --
> > Robert Muir
> > [email protected]
> >
>
>
>
> --
> Weiwei Wang
> Alex Wang
> 王巍巍
> Room 403, Mengmin Wei Building
> Computer Science Department
> Gulou Campus of Nanjing University
> Nanjing, P.R.China, 210093
>
> Homepage: http://cs.nju.edu.cn/rl/weiweiwang
>

Re: How to do alias(Pinyin) search in Lucene

Reply via email to