Hi Daniel, 2008/6/28 Daniel Naber <[EMAIL PROTECTED]>: > On Mittwoch, 7. Mai 2008, Németh László wrote: > >> On the practical usage of the new extension: see README.compound in >> the source distribution. More documentation and development tools for >> the extended hyphenation patterns are planned. > > Hi Laci, > > what are your plans for writing those tools? I'm only asking because I > wonder if it makes sense to make patgen work here or if those tools would > replace patgen. patgen asks for values hyph_start, hyph_finish and several > others which I don't know "correct" values for and I didn't find examples > for German. > > Here's my understanding of what needs to be done to improve German > hyphenation. Please correct me if this is wrong: > > 1. Build a list of compound words that have hyphenation points only between > their compound parts. > > 2. Build a list of non-compounds with hyphenation points (as it currently > needs to be done to generate the patterns) > > 3. Build the patterns for both of the lists and prepare them with > substrings.pl. > > 4. Put the compound patterns, then a line with the string "NEXTLEVEL", then > the non-compounds patterns all in one file. > > Is this correct? How big is the risk that the new compound patterns break > some of the old non-compound hyphenations that already work correctly?
The risk is eliminated, when the first pattern set was generated by Patgen based on compound *and* non-compound data (in this generation level non-compound data doesn't contain hyphenation points). Practically, the first level will contain the logic for compound-word recognition and decomposition. The problem, that Patgen doesn't handle the recursive compound word decomposition of Hyphen (recursion is a good method for pattern compression). Missing word boundaries on the first level is also a potential problem: a few unrecognized hyphenation points are normal for Patgen generated patterns, but this unaccuracy causes bad hyphenation points on the second level hyphenation. Regards, Laci > > Regards > Daniel > > -- > http://www.danielnaber.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
