https://bugs.documentfoundation.org/show_bug.cgi?id=117389

            Bug ID: 117389
           Summary: Remove unneed TRL and LTR marks in Arabic (ar)
                    dictionary file and fix header
           Product: LibreOffice
           Version: 6.1.0.0.alpha1+ Master
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Linguistic
          Assignee: [email protected]
          Reporter: [email protected]

Please fix the following for the Arabic dictionary file
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ar/ar.dic


1) remove left-to-right (LTR) mark in line 13870"

   ﺐﻳﺭﻮﺗ<200e>/60

and in line 48332:

   ﻢﺗﺩﺎﻨﻳ<200e>/169

The copy-pastes here are a bit mangled. Search e.g. with vim for Ctrl+U 200e .
Please, also trace any (upstream) scripts used to generate this dic files for
these characters and fix it also there.


2) remove right-to-left (RTL) mark in line 23883

   ﺇ<200f>ﺘﺑﺎﻋ/65

and in line 52995

   ﺃﻮﻨﺗﺍﺮﻳﻭ<200f>/228      11

and in line 53323

   ﻮﻴﻟﺯ<200f>/228  11

and in line 53338

   ﻱﻮﻨﺴﻛﻭ<200f>/228        18

The copy-pastes here are a bit mangled. Search e.g. with vim for Ctrl+U 200f .
Please, also trace any (upstream) scripts used to generate this dic files for
these characters and fix it also there.


3) Around line number 54767, remove these lines:

   54767 ::::::::::::::
   54768 verb.huns.dic
   54769 ::::::::::::::

If needed, replace it with

   #################
   # verb.huns.dic #
   #################

(Note the # also on the end to be robust and safe for LTR processing.)

Please, also check any (upstream) scripts that might have injected this.


4) Around line number 52828, remove these lines:

   52828 ::::::::::::::
   52829 Condidate3.4.dic
   52830 ::::::::::::::

If needed, replace it with

   ####################
   # Condidate3.4.dic #
   ####################

(Note the # also on the end to be robust and safe for LTR processing.)

Please, also check any (upstream) scripts that might have injected this.


5) Around line number 13554, remove these lines:

   13553 <empty line>
   13554 ::::::::::::::
   13555 names.dic
   13556 ::::::::::::::
   13557 50000

If needed, replace it with

   #############
   # names.dic #
   #############

(Note the # also on the end to be robust and safe for LTR processing.)

Please, also check any (upstream) scripts that might have injected this.


6) Around line number 13011, remove these lines:

   13011 ::::::::::::::
   13012 tools.dic
   13013 ::::::::::::::
   13014 #####  2

If needed, replace it with

   #############
   # tools.dic #
   #############

(Note the # also on the end to be robust and safe for LTR processing.)

Please, also check any (upstream) scripts that might have injected this.


7) Around line number 1, remove these lines:


   1 465929     1
   2 ::::::::::::::
   3 stopwords.dic
   4 ::::::::::::::

If needed, replace it with

   #################
   # stopwords.dic #
   #################

(Note the # also on the end to be robust and safe for LTR processing.)

Please, also check any (upstream) scripts that might have injected this.


8) Any lines with a # at only one end, should also get a # on the other end.
Examples are these lines:

   13558 ###أسماء       3

   13614 #القارات

   13628 #البلدان

   13847 #العواصم

   52819 ##اﻷسماء       4

   52823 #تأليف 5

There are almost 30 lines with (balanced and unbalanced) comments. Perhaps see
upstream  which comments can be solved (if they are temporarily disabling
dictionary words) or which comments can be removed completely, such as #####.
Other balanced comments are welcome.


9) After fixing 7), the first line, before any lines with Arabic words, should
contain the total number of lines of the file.

Omitting lines starting with # and this first line may be done when calculating
this number, but a few lines extra for this file of almost 500,000 lines is not
a problem. A few lines too few will cost a little bit at initialization of the
spell checker as the number in the first line is used to allocate minimally
enough memory. What ever is lacking will be allocated dynamically later but
costs some processing and memory power.

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to