Re: [9fans] Re: ctrans - Chinese language input for Plan9
> In <288YQ7Y33V3RF.38NPGPX4H2CHU@homearch.localdomain> > "Silvan Jegen" wrote: SJ> andp...@foxmail.com wrote: >> On Friday, 22 July 2022, at 2:09 PM, Silvan Jegen wrote: >> > Ah, I didn't know that! I also don't know anyone who does office work >> > in a place where traditional Chinese characters are used though ... >> >> They would use RIME, https://rime.im a free software widely >> recognized among Chinese users who are not satisfied with default >> Pinyin. But unfortunately that thing is written in C++ so making a >> port is unliky. SJ> Funnily enough I use Rime on my Linux machine to input Simplified SJ> Chinese. I honestly just switched a Rime input setting to something that SJ> looks like pinyin but the suggestions seem better to me than the old SJ> IME that I used ... I should probably invest some time in understanding SJ> how the thing actually is supposed to be used (documentation in English SJ> seems sparse and my Chinese sucks). RIME was popularized because most other Pinyin based IMEs on the market suck for traditional Chinese input, for these IMEs' suggestion dictionaries were usually directly substituted from simplified Chinese versions, but mapping simplified Chinese to transitional Chinese is very context sensitive. The byproduct of RIME is the OpenCC https://github.com/BYVoid/OpenCC library that can handles all the trivia of these kinds of translation. The SC support for RIME was contributed by community, I think, and the author of RIME uses Cangjie. Cangjie was not officially designed for simplified Chinese but was extended to be able to handle that. I heard rumors that the author refused to add a switch to prioritize simplified Chinese characters for Cangjie in RIME, so an external dictionary is used if users want to have that behavior. --- LDB -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M7654c6f7091bf0a32c7e3bca Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
andp...@foxmail.com wrote: > On Friday, 22 July 2022, at 2:09 PM, Silvan Jegen wrote: > > Ah, I didn't know that! I also don't know anyone who does office work > > in a place where traditional Chinese characters are used though ... > > They would use RIME, https://rime.im a free software widely > recognized among Chinese users who are not satisfied with default > Pinyin. But unfortunately that thing is written in C++ so making a > port is unliky. Funnily enough I use Rime on my Linux machine to input Simplified Chinese. I honestly just switched a Rime input setting to something that looks like pinyin but the suggestions seem better to me than the old IME that I used ... I should probably invest some time in understanding how the thing actually is supposed to be used (documentation in English seems sparse and my Chinese sucks). -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M9e59e41273b1269646ab8584 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
On Friday, 22 July 2022, at 2:09 PM, Silvan Jegen wrote: > Ah, I didn't know that! I also don't know anyone who does office work in a place where traditional Chinese characters are used though ... They would use RIME, https://rime.im a free software widely recognized among Chinese users who are not satisfied with default Pinyin. But unfortunately that thing is written in C++ so making a port is unliky. ldb -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-Mc5ba1baecec99ea1967578b2 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
On 7/22/22 12:06, Sebastian Higgins wrote: > A few things: > > 1. Cangjie is still widely used in places that uses traditional Chinese > characters. You would still be required to be good at it if you apply for > text-heavy office jobs in these places. > 2. Radical-based/shape-based methods were extremely popular when the > prediction technology wasn't as good (which means Pinyin was significantly > slower). It wasn't until late 2000s to early 2010s before this situation has > changed. > 3. Pinyin without prediction is slow because of what we called the 重码 (lit. > "overlap of encoding") problem. For Pinyin the encoding overlaps because many > characters may have the same Pinyin; the purpose of all shape-based method is > to reduce the overlap problem and thus increase the input speed. > 4. ctrans uses cangjie because (1) implementing shape-based methods was > much, much more simpler than phonetic-based methods because most (if not all) > of the job is table lookup; (2) if we were to use the same UI (or lack > thereof) as ktrans the overlap-of-encoding problem of Pinyin would very > probably drive you nuts when using it; (3) it is the input method the author > uses, however I do admit using Cangjie for simplified Chinese input is kinda > peculiar. > > Source: me who is a native Chinese speaker and have learned Wubi (a > shape-based method for simplified Chinese) in primary school. I had taken a naive attempt at trying getting ktrans to support a form of Chinese input. Admitably, my interest was mostly in stress testing my rewrite of the hashmap used in ktrans, throwing a ~100k character dictionary at it seemed like a fun way to test it. The dictionary I imported was one that used Wubi based mapping for charters, posted by jxy to the 9front mailing list a week or so ago. If anyone is curious the dictionary itself can be found here: https://raw.githubusercontent.com/fcitx/fcitx-table-data/master/wbx.txt This has been super interesting to me from a learning perspective. Thanks for the insight! Jacob Moody -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-Mcf3888dbfc4013192d8c471e Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
On Wednesday, 20 July 2022, at 11:15 PM, cigar562hfsp952fans wrote: > I've often wondered that. What input methods do Chinese speakers use? What do Chinese keyboards look like? How do they find/select the character they want? Are different sets of characters available on different computers, or are input methods standardized? I wonder. Most Chinese speakers just use standard "British and American keyboards". There are keycaps engraved with Wubi or Cangjie or Bopomofo (or Zhuyin), but they are all compatible with QWERTY. On Thursday, 21 July 2022, at 1:58 AM, sirjofri wrote: > I was more referring to computers built without any american influence at all, so no ansi, no ascii, no LTR, probably different keycodes... Cangjie was the first solution to Chinese processing with *personal computers* (at the time of Apple ][ it was sold as extension boards.) There used to be other encoding methods such as using only numpad (Four-Corner Method), or special keyboards (Ming Kwai typewriter), even an input method for Chinese had been invented in US https://patents.google.com/patent/US2412777A, but they were almost disappeared. There are a few other considerations regards to adopting Cangjie besides https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-Mf1934dc65975e0ca3989d488/ctrans-chinese-language-input-for-plan9: 1. Cangjie is copyright free and related IMEs are distributed as free software, while (at least newer version of) Wubi is patented. 2. Personally, I realized the order of strokes has been changed during the last 10 years or so and similarly, the pronunciation of certain characters has also altered over the time. Best wishes --- ldb -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M63fcb9504cafbca55334 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
Heyhey! Sebastian Higgins wrote: > A few things: > > 1. Cangjie is still widely used in places that uses traditional > Chinese characters. You would still be required to be good at it if > you apply for text-heavy office jobs in these places. Ah, I didn't know that! I also don't know anyone who does office work in a place where traditional Chinese characters are used though ... > 2. Radical-based/shape-based methods were extremely popular when > the prediction technology wasn't as good (which means Pinyin was > significantly slower). It wasn't until late 2000s to early 2010s > before this situation has changed. At least in Japan I have never met anyone using a radical-based/shape-based input method. I have not even met anyone using direct Kana input, only through romaji. That said, may be an earlier generation used it more commonly ... > 3. Pinyin without prediction is slow because of what we called the > 重码 (lit. "overlap of encoding") problem. For Pinyin the encoding > overlaps because many characters may have the same Pinyin; the purpose > of all shape-based method is to reduce the overlap problem and thus > increase the input speed. Yeah, it's due to the high homophones count. Only the tones differ and these are not supported in pinyin input methods (as far as I know ...) > 4. ctrans uses cangjie because (1) implementing shape-based methods > was much, much more simpler than phonetic-based methods because most > (if not all) of the job is table lookup; (2) if we were to use the > same UI (or lack thereof) as ktrans the overlap-of-encoding problem > of Pinyin would very probably drive you nuts when using it; (3) it is > the input method the author uses, however I do admit using Cangjie for > simplified Chinese input is kinda peculiar. > > Source: me who is a native Chinese speaker and have learned Wubi > (a shape-based method for simplified Chinese) in primary school. Thanks for the insights. I appreciate it! Cheers, Silvan -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M977f609261cd764b55ad5dbf Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
A few things: 1. Cangjie is still widely used in places that uses traditional Chinese characters. You would still be required to be good at it if you apply for text-heavy office jobs in these places. 2. Radical-based/shape-based methods were extremely popular when the prediction technology wasn't as good (which means Pinyin was significantly slower). It wasn't until late 2000s to early 2010s before this situation has changed. 3. Pinyin without prediction is slow because of what we called the 重码 (lit. "overlap of encoding") problem. For Pinyin the encoding overlaps because many characters may have the same Pinyin; the purpose of all shape-based method is to reduce the overlap problem and thus increase the input speed. 4. ctrans uses cangjie because (1) implementing shape-based methods was much, much more simpler than phonetic-based methods because most (if not all) of the job is table lookup; (2) if we were to use the same UI (or lack thereof) as ktrans the overlap-of-encoding problem of Pinyin would very probably drive you nuts when using it; (3) it is the input method the author uses, however I do admit using Cangjie for simplified Chinese input is kinda peculiar. Source: me who is a native Chinese speaker and have learned Wubi (a shape-based method for simplified Chinese) in primary school. From: Silvan Jegen Sent: Friday, July 22, 2022 12:30 To: 9fans Subject: Re: [9fans] Re: ctrans - Chinese language input for Plan9 a...@sdf.org wrote: > > I stumbled onto an instructive video on youtube not that long ago. I'm > > sure there are a few you'll be able to search for. If I understand > > correctly, it's a combination of entering the phoneme by the nearest > > Latin letter, then select from a diminishing range of suitable options > > on the screen. > > There are other input methods based on the shape of the > characters. Some are better with traditional Chinese characters, > other with simplified characters, it's complicated... Let see if some > Chinese comrade share with us his daily life experience. The Japanese > is input writing kana directly with a Japanese keyboard or by romaji > with roman characters on western keyboards (ka -> か, ) and then > transformed to kanji when necessary. There are different IMEs, but the > principle is the same. I suppose that ktrans is similar, I haven't > tried jet. ktrans seems to be quite different actually. According to the documentation it uses the Cangjie input method [0] which is based on the so called "radicals". These are some more basic elements that the Chinese characters are made of (note that the "radicals" chosen for Cangjie are not identical to the 214 radicals that are commonly used to classify Chinese characters. For the latter see [1]). Every one of these 24 Cangjie radicals gets mapped to an ASCII character and their combinations then uniquely identify a Chinese character (the wikipage at [0] illustrates the approach very well). This input method seems to be old and I have never seen a Chinese person use it. From what I understand, most Chinese people nowadays just write text in Pinyin (a latin transliteration of the Chinese pronounciation) and then the IME helps you choose the correct combination of Chinese characters (potentially taking the context of the text already written into account). Cheers, Silvan [0] https://en.wikipedia.org/wiki/Cangjie_input_method [1] https://en.wikipedia.org/wiki/Kangxi_radical -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-Mf1934dc65975e0ca3989d488 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
Yep, Cangjie is one of those input methods based on shape I was talking about, more appropriate for traditional Chinese characters used in Taiwan, Hong-Kong, etc. South Korea still use kanji similar to traditional Chinese, but I don't know what input method they use. Note that in mainland China people use Pinyin because they imposed the use of simplified Chinese characters. It surprises me to hear that ktrans uses Cangjie, Japanese keyboards let you input kana directly, and the use of kana to write without kanji is common, specially in books for kids, so it seams more natural to me to make a kana->kanji conversion (or romaji->kana->kanji in Western keyboards). But I'm not Japanese, maybe Cangjie is faster, I've never tryed. -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M9f10d9140a5f0838d615958f Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] Re: ctrans - Chinese language input for Plan9
a...@sdf.org wrote: > > I stumbled onto an instructive video on youtube not that long ago. I'm > > sure there are a few you'll be able to search for. If I understand > > correctly, it's a combination of entering the phoneme by the nearest > > Latin letter, then select from a diminishing range of suitable options > > on the screen. > > There are other input methods based on the shape of the > characters. Some are better with traditional Chinese characters, > other with simplified characters, it's complicated... Let see if some > Chinese comrade share with us his daily life experience. The Japanese > is input writing kana directly with a Japanese keyboard or by romaji > with roman characters on western keyboards (ka -> か, ) and then > transformed to kanji when necessary. There are different IMEs, but the > principle is the same. I suppose that ktrans is similar, I haven't > tried jet. ktrans seems to be quite different actually. According to the documentation it uses the Cangjie input method [0] which is based on the so called "radicals". These are some more basic elements that the Chinese characters are made of (note that the "radicals" chosen for Cangjie are not identical to the 214 radicals that are commonly used to classify Chinese characters. For the latter see [1]). Every one of these 24 Cangjie radicals gets mapped to an ASCII character and their combinations then uniquely identify a Chinese character (the wikipage at [0] illustrates the approach very well). This input method seems to be old and I have never seen a Chinese person use it. From what I understand, most Chinese people nowadays just write text in Pinyin (a latin transliteration of the Chinese pronounciation) and then the IME helps you choose the correct combination of Chinese characters (potentially taking the context of the text already written into account). Cheers, Silvan [0] https://en.wikipedia.org/wiki/Cangjie_input_method [1] https://en.wikipedia.org/wiki/Kangxi_radical -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tba6835d445e07919-M9589c3997fe9cf5b52b599d5 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription