First off - I didn't post specifics because I wasn't sure that it might be of interest to the OSX perl comunity as a whole, I hoped to get the interested parties emailing me privately, but then again the total scarcity of docs (that I could find in English) regarding this topic on the net, means it probably merits public posting.
On Tuesday, October 1, 2002, at 06:47 PM, Dan Kogai wrote: > just perldoc Encode && perldoc Encode::JP. Dan, I know you are one of the driving forces behind the unicode side of perl 5.8.0, hats off to you man (sincerly), I got as far as perldoc Encode but haven't yet got to Encode::JP - there's a lot to read. My basic problem is I don't have any fast n' hard examples to go by which I can apply to the situation where I find myself now which is: *parse a collection of ASCII docs mixed in with docs in iso-2022-jp, shiftjis and possibly 7bit-jis, (by which I mean each doc could be 1 of three encodings, not 1 doc a mixture of all three). *parse for tokens (Kanji charcters - ie neither Hiragana or Katakana) *do regex substitutions accordingly The unicode site however unicode apparently lumps kanji in with Chinese, which is understandable but not helpful as I need specific code points for specific Kanji characters ie '月' which are featured in U3200.pdf but as glyphs combined with the number ie codes 32C1 - 32CB. Then, as my own intutition was drawing blanks, I thought perhaps I should ask if anyone else has some pointers which led to my original posting. Pointers anyone ^_^?