Hi, as I wrote quite some weeks ago I took an extended look at hyphenation, with the possible target of creating APL licensed patterns from our own dictionaries. I also looked at the current hyphenator. I finally seem to have grasped how hyphenation and hyphenation pattern generation from a marked up dictionary (patgen.web) works. I implemented my own q&d hyphenator which lead to the detection of a few well hidden bugs in the current hyphenator. Interestingly, nobody seems to care all that much. Do people proof read their output?
Other problems with the current hyphenator are that it contains a lot of crufty code, has a few easy to cure inefficiencies, and it can't handle hyphenations with consonant shifts and other spelling changes, like in the old german spelling backen -> bak-ken and Schiffahrt -> Schiff-fahrt, despite it seems to claim so. (the new german spelling does no longer have these features, but I think there are still languages where this is an issue). Furthermore, the interaction between the main code and the hyphenator is somewhat ineffective too, and the main code does not necessarily have the same understanding on what consitutes a word than the hyphenator.
In parallel to working on the unit test concept and preparing for some refactoring in HEAD, I'm working now on providing a patgen.web rewrite in Java which does not have some of the restricting features of the original and also on a german lexicon which has currently roughly 12k stem forms of words with hyphenation and other information. Help with comleting the lexicon or with other languages is welcome. Unfortunately tools for lexicon maintenance, word inflection and plausibility checks on hyphenated forms are still alpha, at best. I'll publish them on my Apache home page in a few days.
RT: I believe the lexicon(s), pattern generator, the hyphenator and perhaps other tools could be useful for other projects. What would be a sensible approach to encourage reuse?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]