Just to play devil's advocate, I have one other question: would it be cheaper and safer to simply run tests for certain languages using multiple character sets?
Cheaper: is the cost really cheaper to convert? Safer: what if you guess wrong? what if the character set is hard to determine correctly (intentially mixed-up, binary inserted, half-and-half, jumbled character sets, etc.). Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/
