> Edmon <[EMAIL PROTECTED]> wrote: > > > Worst case scenario CJK could have 21 han characters! > > That's assuming the encoded string can be up to 63 octets. For IDNA > the limit is 59 octets, in which case ACE37 can support up to 19 Han > characters, same as several other ACEs (but better than DUDE's 15). DUDE's support for 15 characters is also based on 63 octets. It would be 14 if it is 59 octets. > > But I'm pretty sure that ACE37 is less efficient than DUDE for all > non-CJK scripts. > Yup, there is a trade off definitely. However, it seems that the CJK community is more concerned about the issue so we should take this into consideration when striking a balance. > > All the while, the algorithm is kept to be as simple as DUDE. > > It doesn't look as simple as DUDE to me. To me it looks no simpler than > AMC-ACE-W, which is as efficient as ACE37 for CJK AMC-ACE-W will not always achieve 19 characters with the 4 extra charcters in front, whereas ACE37 will. > Whereas AMC-ACE-W's complexity is in a state machine, ACE37's > complexity is in encoding/recognizing lots of different patterns. I True to an extent, but ACE37 operates under the same mode throughout, similar to the DUDE concept. > > By the way, I'm now in the process of tweaking AMC-ACE-Z, which is as > simple as AMC-ACE-W (if not simpler) and also more efficient. A draft > will be coming soon, along with an evaluation of many ACE proposals. > Edmon, if you'd like to provide me with a C implementation of ACE37, I > can include it in the evaluation. You are welcome (and even encouraged) > to use my example implementation of DUDE as a template. > Will do so asap. > AMC Edmon
