Re: Parsing JIS X 0208 & Shift JIS with 5.8.0

Robin Tue, 01 Oct 2002 07:25:32 -0700

First off - I didn't post specifics because I wasn't sure that it might 
be of interest to the OSX perl comunity as a whole, I hoped to get the 
interested parties emailing me privately, but then again the total 
scarcity of docs (that I could find in English) regarding this topic on 
the net, means it  probably merits public posting.


On Tuesday, October 1, 2002, at 06:47 PM, Dan Kogai wrote:
> just perldoc Encode && perldoc Encode::JP.

Dan, I know you are one of the driving forces behind the unicode side 
of perl 5.8.0, hats off to you man (sincerly), I got as far as perldoc 
Encode but haven't yet got to Encode::JP - there's a lot to read.

My basic problem is I don't have any fast n' hard examples to go by 
which I can apply to the situation where I find myself now which is:

*parse a collection of ASCII docs mixed in with docs in iso-2022-jp, 
shiftjis and possibly 7bit-jis, (by which I mean each doc could be 1 of 
three encodings, not 1 doc a mixture of all three).
*parse for tokens (Kanji charcters - ie neither Hiragana or Katakana)
*do regex substitutions accordingly

The unicode site however unicode apparently lumps kanji in with 
Chinese, which is understandable but not helpful as I need specific 
code points for specific Kanji characters ie  '月' which are featured in 
U3200.pdf but as glyphs combined with the number ie codes 32C1 - 32CB.

Then, as my own intutition was drawing blanks,  I thought perhaps I 
should ask if anyone else has some pointers which led to my original 
posting.

Pointers anyone ^_^?

Re: Parsing JIS X 0208 & Shift JIS with 5.8.0

Reply via email to