On 4/3/2021 5:06 PM, denis.ma...@ub.unibe.ch wrote:
Hi everyone

Now that Hans has implemented the new ligature suppression mechanism via language goodies – thanks again Hans! – we now need to come up with wordlists.

I’ve started working on a list of German words with ligatures that should be suppressed. The list is derived from the word list that comes with the lualatex selnolig package: https://github.com/micoloretan/selnolig/blob/master/selnolig-german-wordlist.tex <https://github.com/micoloretan/selnolig/blob/master/selnolig-german-wordlist.tex>

You can find the current list here : https://github.com/denismaier/context-nolig-wordlist <https://github.com/denismaier/context-nolig-wordlist>

The list is currently organized as follows :

 1. L.25-l.35: This specifies words where automatic pattern matching is
    more difficult than usually because the words contain multiple
    ligatures, some of which must be suppressed while others must be
    preserved. In the case of « Auflagefläche » it’s even the same
    combination of letters. So here, we use the bar | to manually
    indicate points where no ligature must occur.
 2. L. 36ff.: The vast amount of words is currently in that list that
    specifies words where a ff, fl, fi, ffi, or ffl ligature has to be
    broken up after the first f.
 3. L.1804ff contain words where ffi, ffl, or fff ligatures have to be
    prevented after the second f, so the first two fs form a ligature.
 4. The remaining blocks starting at L.1900, l. 2073, l. 2157, l. 2225,
    and l. 2277 suppress ligatures for « ft » and « fft »,  « fb » and
    « ffb », « fh » and « ffh», «fj» and «ffj», and «fk» and «ffk»

Obviously, that list is far from being complete, and the question is if it ever can be. Please have a look and feel free to propose more words to be included – either via mail or directly on github.

More generally, there’s the question how such a list should be enhanced? I was thinking about two options:

 1. The new language options features include a tracker that allows for
    tracking for which words in a given document ligature prevention
    happened, and which words haven’t been touched by the mechanism. It
    should be possible to analyze the log file and to create lists of
    words with ligatures. Should be a rather simple step to derive new
    words for the ligature-suppression wordlist.
 2. A bigger solution might be to use selnoligs patterns in a script
    that can be run over a large corpus, such as the DWDS (Digitales
    Wörterbuch der deutschen Sprache). That should produce us a more
    complete list of words where ligatures must be suppressed.

where is that DWDS ... i can write some code to deal with it (i'd rather start from the source than from some interpretation; who know what more there is to uncover)

additional info: we're talking of a mechanism sort of integrated in the hyphenation loop, where we can also handle compound words, if needed with details about how influence to hyphenate these) so the above question involves:

- exceptions to exceptions
- replacements before hyphenation
- compound words (including lhmin/rhmin overloads)
- (left right two sided) ligature and/or kern prevention

and whatever we like/need more (within reasonable bounds),

Hans

-----------------------------------------------------------------
                                          Hans Hagen | PRAGMA ADE
              Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
       tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to