A problem on performance

luan xl Fri, 25 Aug 2006 17:58:50 -0700

I have got nearly 4 million chinese documents, each size ranges from 1k -300k. So I useorg.apache.lucene.analysis.cn.ChineseAnalyzer as the analyzer for the text.The index havefour fields:

content - tokenized not stored
title - tokenized and stored
path - stored only
date - stored only

For some reason, I divide these documents into 12 sets and useIndexSearcher overMultiReader for search. For all the english query, the speed is very fast,only cost about10-100ms. But when I use the Chinese words for query, the situation is abit confused:If the word is only one char, so the Query is actually a TermQuery, thespeed is very fast.however, If the word is more than one char, the Query is actually aPhraseQuery with slop 0,

IndexSearcher usually cost 3000-5000ms to return the Hits.

I have also tested with the QueryParser and get the same results, and myenvironment is aDell PE2600 2G*2 Xeon, 2GRAM, 10000R/s SCSI, Debian/sarge, Sun JDK 1.5 +lucene 2.0.0


thanks.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

A problem on performance

Reply via email to