Hi, as I wrote quite some weeks ago I took an extended look at hyphenation, with the possible target of creating APL licensed patterns from our own dictionaries. I also looked at the current hyphenator. I finally seem to have grasped how hyphenation and hyphenation pattern generation from a marked up dictionary (patgen.web) works. I implemented my own q&d hyphenator which lead to the detection of a few well hidden bugs in the current hyphenator. Interestingly, nobody seems to care all that much. Do people proof read their output?

Other problems with the current hyphenator are that it
contains a lot of crufty code, has a few easy to cure
inefficiencies, and it can't handle hyphenations with
consonant shifts and other spelling changes, like in the
old german spelling backen -> bak-ken and Schiffahrt ->
Schiff-fahrt, despite it seems to claim so. (the new
german spelling does no longer have these features, but
I think there are still languages where this is an issue).
Furthermore, the interaction between the main code and
the hyphenator is somewhat ineffective too, and the main
code does not necessarily have the same understanding on
what consitutes a word than the hyphenator.

In parallel to working on the unit test concept and preparing
for some refactoring in HEAD, I'm working now on providing
a patgen.web rewrite in Java which does not have some of the
restricting features of the original and also on a german
lexicon which has currently roughly 12k stem forms of words
with hyphenation and other information. Help with comleting
the lexicon or with other languages is welcome. Unfortunately
tools for lexicon maintenance, word inflection and
plausibility checks on hyphenated forms are still alpha, at
best. I'll publish them on my Apache home page in a few days.

RT: I believe the lexicon(s), pattern generator, the hyphenator
and perhaps other tools could be useful for other projects.
What would be a sensible approach to encourage reuse?

J.Pietschmann


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]



Reply via email to