My Dear Perl-Friends!

I tried to get help on this through IRC's #perl channel
but all hints I was given didn't help - probably because
of lack of explanation, so I try here. I have an over
110Mb corpus written mostly in three languages - Japanese,
Chinese and English.  I'd like to make it only Japanese
corpus. 

Let's say that Japanese characters are represented by
letters ABC and Chinese by XYZ. My problem lays also in
fact that both of these languages don't use spaces, so I
decided to concentrate on Japanese
period/exclamation/question marks characters (let's say
o,p & q). So basic corpus.txt looks like that:
...
XYZZYXYXYZYZYX This is a pen. XYZYZ XYZ
ZYXYZ This is a cat. ABCCBAoCABACBq XYZYX \n
XYZZYABCCBAoXYXYZYZYX This is a pen. XYZYZ XYZ
ZYXYZ This is a cat. ABCCBApCABACBo XYZYX \n
...

What I need is to have only japanesesentences.txt:
...
ABCCBAo
CABACBq
ABCCBAo
ABCCBAp
CABACBo
...

Telling Perl-san that A-C is Japanese and X-Z is Chinese
was much to high for my beginner level, so I tried many
times to tell Mr Input_Record_Separator to be $/="." or
"!" or "?" foreach (<>), but something must be wrong with
my grammar basically looking like that:

open(FILEIN, "C:\\corpus.txt"); 
open(FILEOUT, "C:\\japanesesentences.txt");
$/="o" || "p" || "q";  #three of them written originally
in Japanese - other scripts have no problems with it...
foreach (<>) {  print FILEOUT $_; } 
close(FILEIN); close(FILEOUT); exit;

Because it didn't show a sign of working properly I was so
desperated that I decided just to "press enters" after all
".","?" and "!":

foreach (<FILEIN>) { 
$_=~ s/"o" || "p" || "q"/\n/;
print FILEOUT $_;
} 

...but it didn't work either...
am I to silly for Powerfully Erotic Randal's Language?

Alaca

PS: sorry to make it so long but when I explain in two
sentences people advise me to use grep :-)
PS: Many times when I read "Programming Perl" I have
problems with "dry explenations" (meaning without so many
examples). Is there any HP showing some little scripts or
I have to buy thick "Perl Cookbook"?


__________________________________________________
Do You Yahoo!?
Yahoo! BB is Broadband by Yahoo!
http://bb.yahoo.co.jp/


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to