Hello Koji,
Thanks for your kind reply.
Yes, I used QueryParser. normaly I used
Query = QueryParser.parse( ) method.
I put your sample code into lia.analysis.i18n package in LuceneAction
and run JapaneseDemo using 1.4 and 1.9
results are
[echo] Running lia.analysis.i18n.JapaneseDemo...
[java] query = content:ラ?メン屋
I can't get hits result.
For Korean
[echo] Running lia.analysis.i18n.KoreanDemo...
[java] phrase = 경
[java] query =
I can't get query parse result.
Thanks,
Youngho
----- Original Message -----
From: "Koji Sekiguchi" <[EMAIL PROTECTED]>
To: <[email protected]>; "Youngho Cho" <[EMAIL PROTECTED]>
Sent: Thursday, October 27, 2005 9:48 AM
Subject: RE: korean and lucene
> Hi Youngho,
>
> With regard to Japanese, using StandardAnalyzer,
> I can search a word/phase.
>
> Did you use QueryParser? StandardAnalyzer tokenizes
> CJK characters into a stream of single character.
> Use QueryParser to get a PhraseQuery and search the query.
>
> Please see the following sample code. Replace Japanese
> "contents" and (search target) "phrase" with Korean in the program and run.
>
> regards,
>
> Koji
>
> =============================================
> import java.io.IOException;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.analysis.cjk.CJKAnalyzer;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.queryParser.ParseException;
>
> public class JapaneseByStandardAnalyzer {
>
> private static final String FIELD_CONTENT = "content";
> private static final String[] contents = {
> "東京にはおいしいラーメン屋がたくさんあります。",
> "北海道にもおいしいラーメン屋があります。"
> };
> private static final String phrase = "ラーメン屋";
> //private static final String phrase = "屋";
> private static Analyzer analyzer = null;
>
> public static void main( String[] args ) throws IOException,
> ParseException {
> Directory directory = makeIndex();
> search( directory );
> directory.close();
> }
>
> private static Analyzer getAnalyzer(){
> if( analyzer == null ){
> analyzer = new StandardAnalyzer();
> //analyzer = new CJKAnalyzer();
> }
> return analyzer;
> }
>
> private static Directory makeIndex() throws IOException {
> Directory directory = new RAMDirectory();
> IndexWriter writer = new IndexWriter( directory, getAnalyzer(), true );
> for( int i = 0; i < contents.length; i++ ){
> Document doc = new Document();
> doc.add( new Field( FIELD_CONTENT, contents[i], Field.Store.YES,
> Field.Index.TOKENIZED ) );
> writer.addDocument( doc );
> }
> writer.close();
> return directory;
> }
>
> private static void search( Directory directory ) throws IOException,
> ParseException {
> IndexSearcher searcher = new IndexSearcher( directory );
> QueryParser parser = new QueryParser( FIELD_CONTENT, getAnalyzer() );
> Query query = parser.parse( phrase );
> System.out.println( "query = " + query );
> Hits hits = searcher.search( query );
> for( int i = 0; i < hits.length(); i++ )
> System.out.println( "doc = " + hits.doc( i ).get( FIELD_CONTENT ) );
> searcher.close();
> }
> }
>
>
> > -----Original Message-----
> > From: Youngho Cho [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, October 27, 2005 8:18 AM
> > To: [email protected]; Cheolgoo Kang
> > Subject: Re: korean and lucene
> >
> >
> > Hello Cheolgoo,
> >
> > Now I updated my lucene version to 1.9 for using StandardAnalyzer
> > for Korean.
> > And tested your patch which is already adopted in 1.9
> >
> > http://issues.apache.org/jira/browse/LUCENE-444
> >
> > But Still I have no good results with Korean compare with CJKAnalyzer.
> >
> > Single character is good match but more two character word
> > doesn't match at all.
> >
> > Am I something missing or still there need some more works ?
> >
> >
> > Thanks,
> >
> > Youngho.
> >
> >
> > ----- Original Message -----
> > From: "Cheolgoo Kang" <[EMAIL PROTECTED]>
> > To: <[email protected]>; "John Wang" <[EMAIL PROTECTED]>
> > Sent: Tuesday, October 04, 2005 10:11 AM
> > Subject: Re: korean and lucene
> >
> >
> > > StandardAnalyzer's JavaCC based StandardTokenizer.jj cannot read
> > > Korean part of Unicode character blocks.
> > >
> > > You should 1) use CJKAnalyzer or 2) add Korean character
> > > block(0xAC00~0xD7AF) to the CJK token definition on the
> > > StandardTokenizer.jj file.
> > >
> > > Hope it helps.
> > >
> > >
> > > On 10/4/05, John Wang <[EMAIL PROTECTED]> wrote:
> > > > Hi:
> > > >
> > > > We are running into problems with searching on korean
> > documents. We are
> > > > using the StandardAnalyzer and everything works with Chinese
> > and Japanese.
> > > > Are there known problems with Korean with Lucene?
> > > >
> > > > Thanks
> > > >
> > > > -John
> > > >
> > > >
> > >
> > >
> > > --
> > > Cheolgoo
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]