Thanks for the quick answer!
I haven't specified the analyzer so it should be the StandardAnalyzer. I
forgot to mention that I'm using Lucene via Hibernate seach where I can
easily define the fields in the hibernate POJO-classes. But as far as I
know this shouldn't change things that much because I can use the core
Lucene.
And I've used Luke already and the indexed special characters are
represented as "¡"(¡) and "¿" (¿) in the index.
But the analyzer should have nothing to do with the problem currently
because the problem is that, those entities that start with "¿" don't get
indexed at all. And some of those starting with "¡" get indexed and some
don't. Currently 29 entities don't get indexed at all (8900 in total).
I don't need to be able to search those special characters. I just need
those entities getting indexed. The other information in those entities is
more important and it's the names (starting with those special characters)
that seems to make those entities not getting indexed.
Could I fix this using some analyzer during indexing? Actually I tried
using custom analyzer with "ISOLatin1AccentFilter()" but it didn't change
anything. In hibernate search the analyzer is spesified in a property file
or in the POJO-classes but I didn't seem to get it to work. The text went
to the index exactly the same way (when I see it with Luke) like before
and the same entities were still missing.
Good solution for me would be that those special character would get
deleted alltogether from the index so maybe then they wouldn't cause any
trouble. Like "¡Fantástico!- blaaba" would be perfectly okay looking like
"Fantastico- blaaba".
Thanks again in advance,
pn
On Tue, 18 Nov 2008, Erick Erickson wrote:
What analyzer are you using at index and search time? Typical problems
include:
using an analyzer that doesn't understand accented chars (StandardAnalyzer
for instance)
using a different anlyzer during search and index.
Search the user list for "accent" and you'll find this kind of problem
discussed,
and if that doesn't help we need to know what analyzers you are using and
what behavior you really want. Typically, for instance, *requiring* a user
to
type the upside-down exclamation point to get a match on this field would
be considered incorrect.
Also, you'd be helped a lot be getting a copy of Luke and examining your
index
to see exactly what's been indexed, it'll reveal a lot.
Best
Erick
On Tue, Nov 18, 2008 at 10:05 AM, Pekka Nykyri <[EMAIL PROTECTED]>wrote:
Hi!
I'm having problems with entities including special characters (Spanish
language) not getting indexed.
I haven't been able to find the the reason why some entities get indexed
while some don't.
I have 3 fields that (currently) hold the same value. The value for the
fields is example "¡Fantástico!- blaaba". Then when I change ONE of the
three values to "¡Fantástico! - blaaba", the entity gets indexed. So
chanching only one field makes it to index.
But the bigger problem with this is, that I have almost (other fields are
almost similar and I don't think they cause the problem) similar entity,
with exactly the same three "¡Fantástico!- blaaba" -fields and it gets
indexed normally. Even though the "critical" fields are exactly the same.
And also all entities where three fields start with "upside down ?"-mark
doesn't get indexed.
I'm really confused with the problem because I don't seem to be able to
find any logic some entities not being indexed even though they are similar
to some other. And changing only one value of the three makes it index.
Sorry for a really messy message but I just can't explain it more clearly
now.
Thanks in advance,
pn
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]