Since it appears there are still some open issues with multi-byte character support, mainly with Japanese character support, I will not hold up the 2.5.0 release waiting for a resolution. Reason: MHonArc still functions as it has done in the past with respect to the issue. If patches do happen to come in before I release 2.5.0, I will try to include them. If not, they will be applied to a future release.
I would definitely like to have MHonArc updated to do things properly with respect to multi-byte characters at some time. Dealing with Japanese character data issue currently raised, I'd like patches to be address the following: . Existing, correct, functionality is not broken. . If variability exists, then configurable options are provided to the user. This is achievable for filters by adding argument options to control behavior. Note, if there is a way to "do the right thing" automatically, then it should be considered. . If any patches require the use of non-standard Perl modules (i.e. modules that are not included with the standard Perl distribution), then the functionality MUST be optional and no perl aborts occur due to the failure of requiring a module. I will consider auto-including external modules with MHonArc if such inclusion can be managed easily. . Since I do not Japanese, I can do very little in verifying the correctness of any contributions related to Japanese text processing. I'm hoping those qualified to verify the contributions will do so. I'm unsure how to deal with the string clipping issue with respect to resource variables: e.g. $SUBJECT:72$. I see this a fundamental issue with Perl itself since there is no built-in string type that abstracts this problem (like strings in Java) in a simple and efficient matter, yet. An approach that would ignore the problem but make sure nothing bad happens is to change all default resources settings to not using the clipping support in resource variables. Therefore, any clipping must be explicitly specified under the advisory of the problems that multi-byte character encodings may cause. I believe I will go make this kind of change to default resource settings for v2.5. When I get time, I'll recheck the status of UTF8 support in Perl. A major issue is have conversion support to and from UTF8 character encodings. Note, any change to Unicode in MHonArc could ripple through the entire existing code base and may require a significant rewrite. --ewh
