Hi everyone

Now that Hans has implemented the new ligature suppression mechanism via 
language goodies - thanks again Hans! - we now need to come up with wordlists.

I've started working on a list of German words with ligatures that should be 
suppressed. The list is derived from the word list that comes with the lualatex 
selnolig package: 
https://github.com/micoloretan/selnolig/blob/master/selnolig-german-wordlist.tex

You can find the current list here : 
https://github.com/denismaier/context-nolig-wordlist

The list is currently organized as follows :


  1.  L.25-l.35: This specifies words where automatic pattern matching is more 
difficult than usually because the words contain multiple ligatures, some of 
which must be suppressed while others must be preserved. In the case of « 
Auflagefläche » it's even the same combination of letters. So here, we use the 
bar | to manually indicate points where no ligature must occur.
  2.  L. 36ff.: The vast amount of words is currently in that list that 
specifies words where a ff, fl, fi, ffi, or ffl ligature has to be broken up 
after the first f.
  3.  L.1804ff contain words where ffi, ffl, or fff ligatures have to be 
prevented after the second f, so the first two fs form a ligature.
  4.  The remaining blocks starting at L.1900, l. 2073, l. 2157, l. 2225, and 
l. 2277 suppress ligatures for « ft » and « fft »,  « fb » and « ffb », « fh » 
and « ffh», «fj» and «ffj», and «fk» and «ffk»

Obviously, that list is far from being complete, and the question is if it ever 
can be. Please have a look and feel free to propose more words to be included - 
either via mail or directly on github.

More generally, there's the question how such a list should be enhanced? I was 
thinking about two options:

  1.  The new language options features include a tracker that allows for 
tracking for which words in a given document ligature prevention happened, and 
which words haven't been touched by the mechanism. It should be possible to 
analyze the log file and to create lists of words with ligatures. Should be a 
rather simple step to derive new words for the ligature-suppression wordlist.
  2.  A bigger solution might be to use selnoligs patterns in a script that can 
be run over a large corpus, such as the DWDS (Digitales Wörterbuch der 
deutschen Sprache). That should produce us a more complete list of words where 
ligatures must be suppressed.

What do you think?

Best,
Denis
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

Reply via email to