https://bugs.kde.org/show_bug.cgi?id=518826
--- Comment #7 from Noah Davis <[email protected]> --- (In reply to makosol from comment #6) > Yes it's what i did but my point is that nothing tells the user what to do > to add languages. > Best would be to have a button to do that. > Or at least display a message telling the user he can add languages by > installing tesseract packages. We can add a line telling the user to install languages. It would be convenient, but I don't really want to add a button to install languages. If we were going to add a button to install languages, I see it working a few different ways. # Distro package installation scripts ## Pros - Space efficient. - Updates are handled by the distro. - It should always work with the system tesseract library. - Least likely to have user complaints *if* the scripts are always up to date with the latest distros. ## Cons - We'd have to support every distro or explicitly exclude all but the most common distros. Even only supporting Arch, Debian, Fedora, openSUSE and Ubuntu could be a massive hassle to maintain. - If we ever wanted Spectacle to support Flatpak, I'm not sure if we could use this. - This isn't going to work on any system that doesn't have a system package manager. - This isn't going to work for users who don't have the authority to install system packages. # Downloading training data files ## Pros - We can make this work without needing to pay much attention to what distros are doing. - It would probably work well for Flatpaks if Spectacle ever started supporting them. - It would work for systems without system package managers. - It would work for users who don't have authority to install system packages. ## Cons - We have to handle the updates. We could maybe download the data using the git repo (https://github.com/tesseract-ocr/tessdata) and only checkout selected training data files on the latest stable branch. Then we also need to decide when to update the data files. Would it be acceptable to check once a day/week/month? What if the repo changes its structure? What if the tesseract library updates and requires a new training data format that breaks existing data? How do we handle the old data files? The user could have multiple versions of tesseract installed. There are a ton of unanswered questions with no clear answer. We would have to put in a lot of effort to make this robust. It's really not any less work than managing distro package installation scripts. - If Tesseract stops using that particular git repo or some other download link that we rely on, the install button is broken the moment that happens. - If we don't use a git repo (I don't see why not to use git, but maybe there is a reason), we have to ensure that the packages we download are signed and hashed so that we don't accidentally download malicious files. I am not a security expert. - Can be wasteful of space if there are multiple user accounts or system packages overlapping with downloaded files. # Packaging training data files with Spectacle ## Pros - Users never even have to ask to have them downloaded - Should always work with a given release of Spectacle - Eliminates a lot of security and platform support concerns ## Cons - The most wasteful of space. The files can't even be shared with other programs using tesseract or tesseract CLI. Doesn't matter if you will never use tesseract, you now have all the training data on your system. I'm not sure how much space it would take, but the latest FreeBSD package for all tesseract training data has an installed size of 1014.79 MB. I picked FreeBSD because it puts them all into one package while most distros split the training data into packages by script. It could be a problem for small systems and it's kind of expected for Plasma to be installed with Spectacle most of the time. - Only updates as often as Spectacle updates. Not really that severe of a con. -- You are receiving this mail because: You are watching all bug changes.
