> -----Ursprüngliche Nachricht----- > Von: Hans Hagen <j.ha...@xs4all.nl> > Gesendet: Samstag, 3. April 2021 17:58 > An: mailing list for ConTeXt users <ntg-context@ntg.nl>; Maier, Denis > Christian (UB) <denis.ma...@ub.unibe.ch> > Betreff: Re: [NTG-context] Ligature suppression word list > > On 4/3/2021 5:06 PM, denis.ma...@ub.unibe.ch wrote: > > Hi everyone > > > > Now that Hans has implemented the new ligature suppression mechanism > > via language goodies - thanks again Hans! - we now need to come up > > with wordlists. > > > > I've started working on a list of German words with ligatures that > > should be suppressed. The list is derived from the word list that > > comes with the lualatex selnolig package: > > https://github.com/micoloretan/selnolig/blob/master/selnolig-german-wo > > rdlist.tex > > <https://github.com/micoloretan/selnolig/blob/master/selnolig-german-w > > ordlist.tex> > > > > You can find the current list here : > > https://github.com/denismaier/context-nolig-wordlist > > <https://github.com/denismaier/context-nolig-wordlist> > > > > The list is currently organized as follows : > > > > 1. L.25-l.35: This specifies words where automatic pattern matching is > > more difficult than usually because the words contain multiple > > ligatures, some of which must be suppressed while others must be > > preserved. In the case of « Auflagefläche » it's even the same > > combination of letters. So here, we use the bar | to manually > > indicate points where no ligature must occur. > > 2. L. 36ff.: The vast amount of words is currently in that list that > > specifies words where a ff, fl, fi, ffi, or ffl ligature has to be > > broken up after the first f. > > 3. L.1804ff contain words where ffi, ffl, or fff ligatures have to be > > prevented after the second f, so the first two fs form a ligature. > > 4. The remaining blocks starting at L.1900, l. 2073, l. 2157, l. 2225, > > and l. 2277 suppress ligatures for « ft » and « fft », « fb » and > > « ffb », « fh » and « ffh», «fj» and «ffj», and «fk» and «ffk» > > > > Obviously, that list is far from being complete, and the question is > > if it ever can be. Please have a look and feel free to propose more > > words to be included - either via mail or directly on github. > > > > More generally, there's the question how such a list should be enhanced? > > I was thinking about two options: > > > > 1. The new language options features include a tracker that allows for > > tracking for which words in a given document ligature prevention > > happened, and which words haven't been touched by the mechanism. It > > should be possible to analyze the log file and to create lists of > > words with ligatures. Should be a rather simple step to derive new > > words for the ligature-suppression wordlist. > > 2. A bigger solution might be to use selnoligs patterns in a script > > that can be run over a large corpus, such as the DWDS (Digitales > > Wörterbuch der deutschen Sprache). That should produce us a more > > complete list of words where ligatures must be suppressed. > > where is that DWDS ... i can write some code to deal with it (i'd rather start > from the source than from some interpretation; who know what more there > is to uncover)
The DWDS is here: https://www.dwds.de/ But I still need to check how we can extract the words from there... Denis ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://context.aanhet.net archive : https://bitbucket.org/phg/context-mirror/commits/ wiki : http://contextgarden.net ___________________________________________________________________________________