Hi Grant and Otis,

Thanks for the feedback, I appreciate it. You've given some good ideas.

Sounds like a really interesting system! I am curious, are your users fluent in multiple languages or are you using some type of translation component?

The former. We're talking about construction projects, where English is (generally) something of a Lingua Franca, as it were (a really big construction project these days might use Australian architects, British managers and UAE-based engineers on a project in Shanghai). So we might have an architect forwarding a message on to an engineer in English, she forwards it to the ground team in Shanghai in English, but they then discuss it amongst themselves in Chinese... all in the space of one forwarded email.

How are you querying? Are users entering mixed language queries too?
<snip>

Good question(s). Automatically detecting the indexing language doesn't NECESSARILY help us with the searching, as we'll have a lot less text to work with. On the plus side, we can always ASK what language the text they're searching for is with a drop-down or something; we can't really ask what language their correspondence is in, as it may be mixed.

Multiple indexes is an option but we're very concerned about performance and size -- we're talking many many millions of things to index, having English/Chinese/Arabic/who knows what else indexes could be nightmare.

Also, is the text so finely delineated as your example? We sometimes run across the case where foreign languages will use other languages (mostly English) mid-sentence and it makes things quite ugly. Approach 4 should handle this, though

Yeah, that's one of our worries. People often can't find the right word for what they want to say, etc., so they slip back into another language.

Anyway, thanks for that and the rest of the ideas. We think that StandardAnalyzer will do us for now (Chinese only); when we hit more complicated languages I'll come up with a plan/design for the "Super Analyzer" and post it to this list for discussion and/or flamewar.

Cheers,

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to