Re: Problem with lucene.

Erick Erickson Tue, 30 Jan 2007 06:02:56 -0800

Yes, surely the analyzer used at indexing time governs what's in the index
and thus what can be searched. I'd surmise that your index doesn't contain
any tokens with < or > if StandardAnalyzer was used, so no matter what
analyzer you use at query time, you won't be able to find tokens that aren't
in your index.


There is one exception. If the fields were indexed UN_TOKENIZED then no
tokenization occurs at index time. But that means that the input isn't
broken into words, so if you indexed "this is a sentence", you wouldn't
match unless you searched on "this is a sentence", stop words and all.

So, you have to go back and re-index with different analyzers if you really
need to search on tokens like < and >. You might get some benefit from
PerFieldAnalyzerWrapper, and you might have to index the same data in more
than one field, say dataTokenized and dataUnTokenized to get the behavior
you need....

Best
Erick

On 1/30/07, poeta simbolista <[EMAIL PROTECTED]> wrote:




Thanks for your reply.
I am working with an index which is created separately. It is created with
a
StandardTokenizer.
I have read there should be used the same tokenizer which the index was
created with.
Anyway, I have tried other tokenizers while consulting, such as the
WhitespaceTokenizer, but the same results happen.
So, does it boil down to the tokenizer which the index was created with?

Thanks a lot.
Diego

Ps. Yes, I was using Luke which works great. But anyway, things like <*
dont
work in any analyser -- don give me any results, although searching in the
db gives me loads -- and also i can get those results (via searching other
fields) on luke and i can see the text is there, the < and > are there.


Erick Erickson wrote:
>
> Sure, your problem is probably that the query goes through an analyzer
and
> its associated tokenizer. Probably something like StandardAnalyzer which
> "massages" the input and strips out most non-alphabetic characters,....
> except some. It tries to be smart about URLs, e-mail addresses, etc.
>
> If you're new, I recommend you use WhitespaceAnalyzer until you get
> familiar
> with what analyzers do for you. Be aware that WhitespaceAnalyzer does
NOT
> automatically lower-case your input, so "Which" won't match "which".
It's
> easy enough to make your own analyzer by subclassing of the standard
ones.
> The book "Lucene in Action " is valuable for this, although you should
be
> aware that it was written to the 1.4 codebase, so there are a few
> differences.
>
> It is important that the analyzer you use at *index* time is compatible
> with
> the one you use at *query* time. Until you're more familiar with this, I
> simply recommend you use the *same* analyzer at index time that you use
at
> search time. That'll give you more intuitive results. You'll want to
> refine
> the use of analyzers later...
>
> I also recommend that you get a copy of luke (google lucene luke). It
will
> allow you to examine your index, parse queries through the GUI, examine
> the
> effects of different analyzers on input etc. It's a great tool and one
> that'll make your life much easier.
>
> Best
> Erick
>
>
> On 1/29/07, poeta simbolista <[EMAIL PROTECTED]> wrote:
>>
>>
>> Hi there, this is my very first post at this forum... please be
>> considerate
>> :)
>>
>> Well, i have a problem when sending a query such as:
>>
>> +description:<
>>
>> Once the query is parsed, it returns me the empty String, which means
the
>> String "<" that i want to search for on the field description is
ignored.
>> If i use normal words then it is taken. Do you know why this could be?
>> Thanks.
>> --
>> View this message in context:
>> http://www.nabble.com/Problem-with-lucene.-tf3137405.html#a8694565
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
>

--
View this message in context:
http://www.nabble.com/Problems-with-some-characters-tf3137405.html#a8707691
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem with lucene.

Reply via email to