Hi Grant and Otis,
Thanks for the feedback, I appreciate it. You've given some good ideas.
Sounds like a really interesting system! I am curious, are your users
fluent in multiple languages or are you using some type of translation
component?
The former. We're talking about construction projects, where English is
(generally) something of a Lingua Franca, as it were (a really big
construction project these days might use Australian architects, British
managers and UAE-based engineers on a project in Shanghai). So we might
have an architect forwarding a message on to an engineer in English, she
forwards it to the ground team in Shanghai in English, but they then
discuss it amongst themselves in Chinese... all in the space of one
forwarded email.
How are you querying? Are users entering mixed language queries too?
<snip>
Good question(s). Automatically detecting the indexing language doesn't
NECESSARILY help us with the searching, as we'll have a lot less text to
work with. On the plus side, we can always ASK what language the text
they're searching for is with a drop-down or something; we can't really
ask what language their correspondence is in, as it may be mixed.
Multiple indexes is an option but we're very concerned about performance
and size -- we're talking many many millions of things to index, having
English/Chinese/Arabic/who knows what else indexes could be nightmare.
Also, is the text so finely delineated as your example? We sometimes
run across the case where foreign languages will use other languages
(mostly English) mid-sentence and it makes things quite ugly. Approach
4 should handle this, though
Yeah, that's one of our worries. People often can't find the right word
for what they want to say, etc., so they slip back into another language.
Anyway, thanks for that and the rest of the ideas. We think that
StandardAnalyzer will do us for now (Chinese only); when we hit more
complicated languages I'll come up with a plan/design for the "Super
Analyzer" and post it to this list for discussion and/or flamewar.
Cheers,
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]