On Mon, 5 Jan 2004 22:19, Zhang Weiwu wrote: > Hello. I've worked an hour to figure out a serial of sed command to process > some text (without any luck, you kown I'm kinda newbie). I really > appreciate your help. > > The original text file is in this form -- for each line: > one Chinese word then one or two English word seperated by space. > > I wish to change to: > 1) target file: one English word, then a space, then a Chinese word > coorisponding to that English word. > 2) if in the original file one Chinese word has more than one English word > following in the same line, repeat the Chinese word to satisfy 1). > > Define: Chinese word = one or more continous bytes of data where each byte > is greater then 128 in value. (it is true in GB2312 Chinese charset which > this email is written in.) > Define: English word = one or more continous bytes of [a-z]. > > Say, for the original file: > =========== > 一a av > 可歌可泣aaav > 无可奉告aacm > =========== > The target file should be: > =========== > a 一 > av 一 > aaav 可歌可泣 > aacm 无可奉告 > =========== > > I tried to do things like s/\(.*\)\([a-z]*\)/\2 \1/ but the first \(.*\) is > too greedy and included the rest [a-z].
Well the greedy part is easily fixed with: s/\([^a-z]*\)\([a-z]*\)/\2 \1/ But this will not work for those lines with 2 english words. The following should: % sed -n -e 's/\([^a-z]*\)\([a-z]*\) .*/\2 \1/p' -e 's/\([^a-z]*\)[a-z]* \([a-z]*\)/\2 \1/p' original > target Malcolm Kay _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
