On Mon, 18 Aug 2008 23:07:19 +0800
"finy finy" <[EMAIL PROTECTED]> wrote:

> because i use chinese character, for example "ibm_______________"
> solr will parse it into a term "ibm" and a phraze "_________ ______"
> can i use solr to query with a term "ibm" and a term "_________"  and a term 
> "______"?

Hi finy,
you should look into n-gram tokenizers. Not sure if it is documented in the 
wiki, but it has been discussed in the mailing list quite a few times.

in short, an n-gram tokenizer breaks your input into blocks of characters of 
size n , which are then used to compare in the index. I think for Chinese , 
bi-gram is the favoured approach.

good luck,
B
_________________________
{Beto|Norberto|Numard} Meijome

I used to hate weddings; all the Grandmas would poke me and
say, "You're next sonny!" They stopped doing that when i
started to do it to them at funerals.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

Reply via email to