On Jul 9, 2007, at 22:30, J.Pietschmann wrote:

a_l.delmelle wrote in a bugzilla entry:
Hyphenation is, in fact, only applicable to pure alphabetical characters.

Well, no. The pattern based hyphenator can deal with any Unicode
characters (apart from digits, whitespace and the dot, which have
a special meaning in the pattern definitions). If the word parser
would use the character classes from the active pattern file for
parsing words, basically anything could be used. This would only
need a proper interface for retrieving the character classes. The
class canonicalization could even be folded into the parsing process
for better performance.

OK, I see the possibilities. The fact that digits have this special meaning in the patterns does have its reasons, though. I have yet to encounter a text in which anything was hyphenated but words. Dates or timestamps? Digits? Serial numbers? E-mail addresses? URLs? Meaningless,ArtificallyGlued-togetherPseudo*Words? Never seen any of those hyphenated. Wrapped, sometimes, but never hyphenated.

Looked around a bit, and combining 'hyphenation' and 'numbers' only got me in the direction of hyphenation /of/ numbers when spelled out completely --as words. So what I meant by that statement was: Hyphenation makes sense only in the context of written text, as in relation to a dictionary.

Seems to me the reporter is wrong to expect that sequence of 80+ digits to be hyphenated under any circumstance, and even the comma- case... Easy enough to come up with such oddities, but when would you ever really need that? And more importantly: Is it really hyphenation you would need then?



