https://bugs.kde.org/show_bug.cgi?id=518826

--- Comment #7 from Noah Davis <[email protected]> ---
(In reply to makosol from comment #6)
> Yes it's what i did but my point is that nothing tells the user what to do
> to add languages.
> Best would be to have a button to do that.
> Or at least display a message telling the user he can add languages by
> installing tesseract packages.

We can add a line telling the user to install languages. It would be
convenient, but I don't really want to add a button to install languages. If we
were going to add a button to install languages, I see it working a few
different ways.

# Distro package installation scripts
## Pros
- Space efficient.
- Updates are handled by the distro.
- It should always work with the system tesseract library.
- Least likely to have user complaints *if* the scripts are always up to date
with the latest distros.
## Cons
- We'd have to support every distro or explicitly exclude all but the most
common distros. Even only supporting Arch, Debian, Fedora, openSUSE and Ubuntu
could be a massive hassle to maintain.
- If we ever wanted Spectacle to support Flatpak, I'm not sure if we could use
this.
- This isn't going to work on any system that doesn't have a system package
manager.
- This isn't going to work for users who don't have the authority to install
system packages.

# Downloading training data files
## Pros
- We can make this work without needing to pay much attention to what distros
are doing.
- It would probably work well for Flatpaks if Spectacle ever started supporting
them.
- It would work for systems without system package managers.
- It would work for users who don't have authority to install system packages.
## Cons
- We have to handle the updates. We could maybe download the data using the git
repo (https://github.com/tesseract-ocr/tessdata) and only checkout selected
training data files on the latest stable branch. Then we also need to decide
when to update the data files. Would it be acceptable to check once a
day/week/month? What if the repo changes its structure? What if the tesseract
library updates and requires a new training data format that breaks existing
data? How do we handle the old data files? The user could have multiple
versions of tesseract installed. There are a ton of unanswered questions with
no clear answer. We would have to put in a lot of effort to make this robust.
It's really not any less work than managing distro package installation
scripts.
- If Tesseract stops using that particular git repo or some other download link
that we rely on, the install button is broken the moment that happens.
- If we don't use a git repo (I don't see why not to use git, but maybe there
is a reason), we have to ensure that the packages we download are signed and
hashed so that we don't accidentally download malicious files. I am not a
security expert.
- Can be wasteful of space if there are multiple user accounts or system
packages overlapping with downloaded files.

# Packaging training data files with Spectacle
## Pros
- Users never even have to ask to have them downloaded
- Should always work with a given release of Spectacle
- Eliminates a lot of security and platform support concerns
## Cons
- The most wasteful of space. The files can't even be shared with other
programs using tesseract or tesseract CLI. Doesn't matter if you will never use
tesseract, you now have all the training data on your system. I'm not sure how
much space it would take, but the latest FreeBSD package for all tesseract
training data has an installed size of 1014.79 MB. I picked FreeBSD because it
puts them all into one package while most distros split the training data into
packages by script. It could be a problem for small systems and it's kind of
expected for Plasma to be installed with Spectacle most of the time.
- Only updates as often as Spectacle updates. Not really that severe of a con.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to