Thanks for the quick answer!

I haven't specified the analyzer so it should be the StandardAnalyzer. I forgot to mention that I'm using Lucene via Hibernate seach where I can easily define the fields in the hibernate POJO-classes. But as far as I know this shouldn't change things that much because I can use the core Lucene.

And I've used Luke already and the indexed special characters are represented as "¡"(¡) and "¿" (¿) in the index.

But the analyzer should have nothing to do with the problem currently because the problem is that, those entities that start with "¿" don't get indexed at all. And some of those starting with "¡" get indexed and some don't. Currently 29 entities don't get indexed at all (8900 in total).

I don't need to be able to search those special characters. I just need those entities getting indexed. The other information in those entities is more important and it's the names (starting with those special characters) that seems to make those entities not getting indexed.

Could I fix this using some analyzer during indexing? Actually I tried using custom analyzer with "ISOLatin1AccentFilter()" but it didn't change anything. In hibernate search the analyzer is spesified in a property file or in the POJO-classes but I didn't seem to get it to work. The text went to the index exactly the same way (when I see it with Luke) like before and the same entities were still missing.

Good solution for me would be that those special character would get deleted alltogether from the index so maybe then they wouldn't cause any trouble. Like "¡Fantástico!- blaaba" would be perfectly okay looking like "Fantastico- blaaba".

Thanks again in advance,
pn

On Tue, 18 Nov 2008, Erick Erickson wrote:

What analyzer are you using at index and search time? Typical problems
include:
using an analyzer that doesn't understand accented chars (StandardAnalyzer
for instance)
using a different anlyzer during search and index.

Search the user list for "accent" and you'll find this kind of problem
discussed,
and if that doesn't help we need to know what analyzers you are using and
what behavior you really want. Typically, for instance, *requiring* a user
to
type the upside-down exclamation point to get a match on this field would
be considered incorrect.

Also, you'd be helped a lot be getting a copy of Luke and examining your
index
to see exactly what's been indexed, it'll reveal a lot.

Best
Erick

On Tue, Nov 18, 2008 at 10:05 AM, Pekka Nykyri <[EMAIL PROTECTED]>wrote:

Hi!

I'm having problems with entities including special characters (Spanish
language) not getting indexed.

I haven't been able to find the the reason why some entities get indexed
while some don't.

I have 3 fields that (currently) hold the same value. The value for the
fields is example "¡Fantástico!- blaaba". Then when I change ONE of the
three values to "¡Fantástico! - blaaba", the entity gets indexed. So
chanching only one field makes it to index.

But the bigger problem with this is, that I have almost (other fields are
almost similar and I don't think they cause the problem) similar entity,
with exactly the same three "¡Fantástico!- blaaba" -fields and it gets
indexed normally. Even though the "critical" fields are exactly the same.

And also all entities where three fields start with "upside down ?"-mark
doesn't get indexed.

I'm really confused with the problem because I don't seem to be able to
find any logic some entities not being indexed even though they are similar
to some other. And changing only one value of the three makes it index.

Sorry for a really messy message but I just can't explain it more clearly
now.

Thanks in advance,
pn

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to