Hi, thank you for your quick reply. In <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > On March 23, 2006 at 01:36, Masao Takaku wrote: > > > MHonArc outputs links of URL-like strings automatically. > > When a message includes a string "See http://www.example.com/foo/bar/", > > MHonARC process this as follows; > > > > See <a > > href="http://www.example.com/foo/bar/">http://www.example.com/foo/bar/ > > </a> > > > > It works well, but in case of an URL-like string followed by non-ASCII > > text without space, this feature is not usefull; > > e.g. "http://www.example.com/foo/bar/を見て.", which means > > "See http://www.example.com/foo/bar/" in Japanese, goes to as follows: > > > > <a > > href="http://www.example.com/foo/bar/を見て">http://www.e > > xample.com/foo/bar/を見て</a>;. > > > > In this example, the outputs should be like the following: > > > > <a > > href="http://www.example.com/foo/bar/">http://www.example.com/foo/bar/</a> > > を見て. > > > > My environment is Perl-5.8.0 and MHonArc-2.6.15 (default setting). > > > > Does anyone know how to do this, or any workarounds? > > First, you may want to check out <http://www.mhonarc.jp/> for > Japanese-specific usage information MHonArc. There should also > be links to a Japanese-based mailing list which may be useful.
Thanks! <http://www.mhonarc.jp/2.6.x/iso2022jp.html#summary>, rcfile for ISO-2022-JP encoding, is a good resouce and works fine. Using the resouce settings based on ISO-2022-JP, URL-linking has limited only for non-ASCII text. This seems to be workaround for my problem. > As for your specific problem, you may need disable URL linking. > This can be done by specify -nourl on the command-line or > <NOURL> in your resource file. The '&' is a legal URL character, > and MHonArc does not try to interpret what character entity reference > values resolve to to determine if it should be included. Nop... disabling URL linking is not what I have wanted. # URL linking is almostly successful except for non-ASCII URLs. BTW: It's true that '&' is a legal URL character, but "U+3092" is an invalid character for URL and a numerical entity "を" is a equivalent to "U+3092" in HTML. And how to interpret non-ASCII-URLs in at least Japanese encodings is very dependent on browser/server settings. Is this assumption also true in other languages/encodings? If so, I think that MHonARC, even in default settings, should treat these characters as invalid URL characters in URL linking code. > The URL linking code is a single regex operation. > > I'm not sure at this time on what code changes could be done. > If you go with ISO-2022-JP encoding for your archives, it may > avoid this problem. -- Masao Takaku // [EMAIL PROTECTED]
