Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

Agustin Martin Wed, 05 Oct 2022 05:12:13 -0700

El jue, 22 sept 2022 a las 21:30, Soren Stoutner
(<so...@stoutner.com>) escribió:
>
> On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
>
> > First of all, I am curious about the reasons behind this new format,
> > the problems it deals with and its advantages. I assume they are valid
> > enough, but they imply yet another spellchecking engine/format. We
> > currently have goog old ispell, aspell and hunspell. vim has its own
> > spellchecker engine using its own format, with dicts that can be
> > created from old myspell2 dicts. We did not add vim format dicts (from
> > aspell dicts sources) since there seems to be some work to make vim
> > use hunspell directly. And now these bdict dicts.
>
> The .bdic format is specified by the upstream Chromium project, and is 
> required by anything that is based off of Chromium's code, like Qt WebEngine. 
>  I do not know why they went with a proprietary binary format, but I would 
> assume that if they went to so much trouble to not use the standard Hunspell 
> format there must have been something to make it worthwhile, like some 
> performance improvement.  Perhaps I am giving Google too much credit for 
> having logical reasons instead of making arbitrary decisions.

Hi, Soren

It s a pity not to have more info about the reasons for this new
format. Even if using it is more effficient in terms of plain
performance, I do not think that is noticeable in stuff like chromium.

Another question is whether that bdic format is expected to change or
that is very unlikely.

Thinking about this, I have done some tests about these bdic files
being generated at postinst, like emacs byte-compiled files (although
my tests were more rude), delegating everything to the qtwebengine
packages. . bdic generation is not very slow, but IMHO is not fast
enough to go this way (which woud require moving
qwebengine_convert_dic to Qt WebEngine runtime package and control
everything from it).

One noticeable thing is that bdic generation failed for some hunspell
dicts I have installed

++ processing an_ES.aff
[1003/125813.760330:FATAL:aff_reader.cc(305)] Did not find a space in 'y i'.
Trace/breakpoint trap
++ processing ar.aff
[1003/125813.796753:FATAL:aff_reader.cc(123)] We don't support the
IGNORE command yet. This would change how we would insert things in
our lookup table.
++ processing gl_ES.aff
gl_ES.dic_delta not found.
Reading gl_ES.aff
Reading gl_ES.dic
Serializing...
Verifying...
Word does not match!
Index: 2126
Expected: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battānī
Actual: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battā
ERROR converting, the dictionary does not check out OK.

Regards,

--
Agustin

Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

Reply via email to