https://bugs.documentfoundation.org/show_bug.cgi?id=117389
Bug ID: 117389
Summary: Remove unneed TRL and LTR marks in Arabic (ar)
dictionary file and fix header
Product: LibreOffice
Version: 6.1.0.0.alpha1+ Master
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Linguistic
Assignee: [email protected]
Reporter: [email protected]
Please fix the following for the Arabic dictionary file
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ar/ar.dic
1) remove left-to-right (LTR) mark in line 13870"
ﺐﻳﺭﻮﺗ<200e>/60
and in line 48332:
ﻢﺗﺩﺎﻨﻳ<200e>/169
The copy-pastes here are a bit mangled. Search e.g. with vim for Ctrl+U 200e .
Please, also trace any (upstream) scripts used to generate this dic files for
these characters and fix it also there.
2) remove right-to-left (RTL) mark in line 23883
ﺇ<200f>ﺘﺑﺎﻋ/65
and in line 52995
ﺃﻮﻨﺗﺍﺮﻳﻭ<200f>/228 11
and in line 53323
ﻮﻴﻟﺯ<200f>/228 11
and in line 53338
ﻱﻮﻨﺴﻛﻭ<200f>/228 18
The copy-pastes here are a bit mangled. Search e.g. with vim for Ctrl+U 200f .
Please, also trace any (upstream) scripts used to generate this dic files for
these characters and fix it also there.
3) Around line number 54767, remove these lines:
54767 ::::::::::::::
54768 verb.huns.dic
54769 ::::::::::::::
If needed, replace it with
#################
# verb.huns.dic #
#################
(Note the # also on the end to be robust and safe for LTR processing.)
Please, also check any (upstream) scripts that might have injected this.
4) Around line number 52828, remove these lines:
52828 ::::::::::::::
52829 Condidate3.4.dic
52830 ::::::::::::::
If needed, replace it with
####################
# Condidate3.4.dic #
####################
(Note the # also on the end to be robust and safe for LTR processing.)
Please, also check any (upstream) scripts that might have injected this.
5) Around line number 13554, remove these lines:
13553 <empty line>
13554 ::::::::::::::
13555 names.dic
13556 ::::::::::::::
13557 50000
If needed, replace it with
#############
# names.dic #
#############
(Note the # also on the end to be robust and safe for LTR processing.)
Please, also check any (upstream) scripts that might have injected this.
6) Around line number 13011, remove these lines:
13011 ::::::::::::::
13012 tools.dic
13013 ::::::::::::::
13014 ##### 2
If needed, replace it with
#############
# tools.dic #
#############
(Note the # also on the end to be robust and safe for LTR processing.)
Please, also check any (upstream) scripts that might have injected this.
7) Around line number 1, remove these lines:
1 465929 1
2 ::::::::::::::
3 stopwords.dic
4 ::::::::::::::
If needed, replace it with
#################
# stopwords.dic #
#################
(Note the # also on the end to be robust and safe for LTR processing.)
Please, also check any (upstream) scripts that might have injected this.
8) Any lines with a # at only one end, should also get a # on the other end.
Examples are these lines:
13558 ###أسماء 3
13614 #القارات
13628 #البلدان
13847 #العواصم
52819 ##اﻷسماء 4
52823 #تأليف 5
There are almost 30 lines with (balanced and unbalanced) comments. Perhaps see
upstream which comments can be solved (if they are temporarily disabling
dictionary words) or which comments can be removed completely, such as #####.
Other balanced comments are welcome.
9) After fixing 7), the first line, before any lines with Arabic words, should
contain the total number of lines of the file.
Omitting lines starting with # and this first line may be done when calculating
this number, but a few lines extra for this file of almost 500,000 lines is not
a problem. A few lines too few will cost a little bit at initialization of the
spell checker as the number in the first line is used to allocate minimally
enough memory. What ever is lacking will be allocated dynamically later but
costs some processing and memory power.
--
You are receiving this mail because:
You are the assignee for the bug._______________________________________________
Libreoffice-bugs mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs