Re: Errors in Unihan data : simplified/traditional variants
On 2010/10/30, at 下午8:42, Koxinga wrote: My quickly done parsing program counted 1154 such pairs, where the head character was the same as the character above. It seems to be always in the order kTraditionalVariant then kSimplifiedVariant, so can maybe be automatically corrected. It seems to be a very evident mistake, and the correction should be easy. I can help with that, I am just waiting to see if this is the right place to report problems in Unihan. I also consideredhttp://www.unicode.org/reporting.html , would it be better ? Yes, that would be better. That way it will be tracked and it's less likely to slip through the cracks in my schedule. For general questions, you can email me directly. = Hoani H. Tinikini John H. Jenkins jenk...@apple.com
Errors in Unihan data : simplified/traditional variants
Hello, I recently looked up the relationships traditional-simplified in the Unihan database (Unihan_Variants.txt). I knew it had mistakes and I wanted to help correct some of them, but the first thing that stand out and surprised me was the large number of lines like : U+346F kSimplifiedVariant U+3454 U+346F kTraditionalVariant U+3454 which should be (if I didn't mix them up ...) U+3454 kTraditionalVariant U+346F U+346F kSimplifiedVariant U+3454 My quickly done parsing program counted 1154 such pairs, where the head character was the same as the character above. It seems to be always in the order kTraditionalVariant then kSimplifiedVariant, so can maybe be automatically corrected. It seems to be a very evident mistake, and the correction should be easy. I can help with that, I am just waiting to see if this is the right place to report problems in Unihan. I also considered http://www.unicode.org/reporting.html , would it be better ? I have a lot of other questions and comments on these simplified/traditional relationships, but I guess it will wait the resolution of this problem, this would make for a too long email. Regards, Koxinga
Errors in Unihan?
Hello, In the Unihan.txt database, in the kMandarin field there are entries with duplicate pronunciations. For example: U+4E21 kMandarin 1 LIANG3 2 LIANG3 3 LIANG4 U+4E4E kMandarin 1 HU1 HU2 2 HU1 U+4E86 kMandarin 1 LIAO3 2 LE LIAO3 Is there a reason for these duplicates? If this is the case, the format of this field should be documented better in the header. If these duplications are errors, I can supply a list of them. Also, what's the meaning of the isolated numbers? Other entries certainly contains errors, for example: U+5594 kMandarin 1 WO1 2 01 ^ this is zero. U+4EC0 kMandarin 1 SHI2 2 SHEN2 3 SHI2 SHIU2SHEN2 SHI2 ?? -- shi2 shen2 ?? Regards, Pierpaolo Bernardi
Re: Errors in Unihan?
On Tuesday, November 14, 2000, at 08:24 AM, Pierpaolo Bernardi wrote: In the Unihan.txt database, in the kMandarin field there are entries with duplicate pronunciations. For example: U+4E21kMandarin 1 LIANG3 2 LIANG3 3 LIANG4 U+4E4EkMandarin 1 HU1 HU2 2 HU1 U+4E86kMandarin 1 LIAO3 2 LE LIAO3 Is there a reason for these duplicates? If this is the case, the format of this field should be documented better in the header. If these duplications are errors, I can supply a list of them. That would be very helpful, yes. Also, what's the meaning of the isolated numbers? The value of the field was obtained from dictionaries. When a dictionary provides more than one meaning, it is not infrequent that one pronunciation is specific to a particular meaning and another pronunciation specific to another. This is where the numbers come from. Inasmuch as the database doesn't maintain the link between specific definitions and pronunciations, the isolated numbers should also be removed.