On Mon, May 27, 2019 at 01:56:29PM +0200, Tino Didriksen wrote:
> The PR https://github.com/apertium/apertium/pull/47 wants to add a direct
> dependency on ICU. I am in favour of this, but figured it should be brought
> up on the list.
> Reasoning:
> - HFST and CG-3 both require ICU, and ICU has been the official Unicode
> library for 3 years now.
> - lttoolbox requires libxml2, and libxml2 requires ICU - so Apertium
> already has a transitive dependency on ICU.
> - Language development requires libxml2-utils to get xmllint, which again
> transitively requires ICU.

I think at least HFST and libxml2 have configurable ICU support that can
be turned off with acceptable functionality loss.

> So we might as well embrace ICU entirely - also in other parts of lttoolbox
> and the wider Apertium tools.

I would agree. In past one could've argued that new dependencies make
things harder to install and ICU was not the easiest to work with, but
with current packagings it's not such a big concern. I think ICU
probably still is quite big and slow but we could also immediately make
use of it in few places like OOV tokenisations we've seen in issues
recently that outweighs it.

Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
I tend to follow inline-posting style in desktop e-mail messages.

Attachment: signature.asc
Description: PGP signature

Apertium-stuff mailing list

Reply via email to