https://bugs.documentfoundation.org/show_bug.cgi?id=117408

            Bug ID: 117408
           Summary: Clean up dictionary file headers from licenses and
                    whitespace
           Product: LibreOffice
           Version: 6.1.0.0.alpha1+ Master
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Linguistic
          Assignee: [email protected]
          Reporter: [email protected]

Please, remove license information from dictionary files. The .dic files should
be as clean as possible. License information should be stored in the
appropriate README, LICENSE or COPYRIGHT files. There is also more place and
avoids that license info is maintained in multiple places.

Additionally, and most importantly, encoding problems can arise from characters
with diacritics in license information, especially names of authors. On top of
that, this information is added in different ways, by using whitespace, # or /



1) For Danish, remove on the first line all after the number, including the
whitespace

161315 # (c) Stavekontrolden.dk

See
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/da_DK/da_DK.dic



2) For German, remove line numbers 2 to 18, where line 18 is an empty line and
the rest start with #

See
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_AT_frami.dic
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_CH_frami.dic
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/de/de_DE_frami.dic

(Something similar has been found in the non-frami German dictionaries. If
possible, address those too.)



3) For Italian, remove line numbers 2 to 34 that start with #

See
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/it_IT/it_IT.dic



4) For Guarani, remove whitespace and word "wordlist" from the first line and
remove the second line that is empty

See
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/gug/gug.dic



5) For Dutch, remove the last empty line

See
-
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/nl_NL/nl_NL.dic#n142520



6) For Arabic, remove empty line number 13553

See
- https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ar/ar.dic#n13553
- https://bugs.documentfoundation.org/show_bug.cgi?id=117389



7) For Nepal, remove empty line number 38029. Note that this is better observed
in the plain file (second url).

See:
-
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ne_NP/ne_NP.dic#n38029
- https://cgit.freedesktop.org/libreoffice/dictionaries/plain/ne_NP/ne_NP.dic



8) After cleaning up these files, please check also that the line count in the
first line is correct. I.e. the total lines in the files excludes (if I'm not
mistaken):
- the first line
- any line starting with comment
- any line starting with slash
- any empty lines
- any lines with only whitespace

This could be a general QA check for the dictionary files. I've noticed these
minor improvements as developing for Hunspell/Nuspell and have scripts
available for QA or reporting on this. I'm willing to contribute these, however
I am completely unfamiliar with the LibreOffice development habitat.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to