Hi Daniel,

2008/6/28 Daniel Naber <[EMAIL PROTECTED]>:
> On Mittwoch, 7. Mai 2008, Németh László wrote:
>
>> On the practical usage of the new extension: see README.compound in
>> the source distribution. More documentation and development tools for
>> the extended hyphenation patterns are planned.
>
> Hi Laci,
>
> what are your plans for writing those tools? I'm only asking because I
> wonder if it makes sense to make patgen work here or if those tools would
> replace patgen. patgen asks for values hyph_start, hyph_finish and several
> others which I don't know "correct" values for and I didn't find examples
> for German.
>
> Here's my understanding of what needs to be done to improve German
> hyphenation. Please correct me if this is wrong:
>
> 1. Build a list of compound words that have hyphenation points only between
> their compound parts.
>
> 2. Build a list of non-compounds with hyphenation points (as it currently
> needs to be done to generate the patterns)
>
> 3. Build the patterns for both of the lists and prepare them with
> substrings.pl.
>
> 4. Put the compound patterns, then a line with the string "NEXTLEVEL", then
> the non-compounds patterns all in one file.
>
> Is this correct? How big is the risk that the new compound patterns break
> some of the old non-compound hyphenations that already work correctly?

The risk is eliminated, when the first pattern set was generated by
Patgen based on
compound *and* non-compound data (in this generation level
non-compound data doesn't contain hyphenation points). Practically,
the first level will contain the logic for compound-word recognition
and decomposition. The problem, that Patgen doesn't handle the
recursive compound word decomposition of Hyphen (recursion is a good
method for pattern compression). Missing word boundaries on the first
level is also a potential problem: a few unrecognized hyphenation
points are normal for Patgen generated patterns, but this unaccuracy
causes bad hyphenation points on the second level hyphenation.

Regards,
Laci


>
> Regards
>  Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to