Re: [gentoo-user] extracting text, numbers from screencasts

2016-05-07 Thread Alan McKinnon
On 07/05/2016 16:31, hw wrote:
> Helmut Jarausch schrieb:
>> On 04/08/2016 03:26:53 PM, hw wrote:
>>>
>>> Hi,
>>>
>>> what would be the best approach to extract data
>>> from a screencast?
>>>
>>> The task is to acquire some data from the display of
>>> a GUI program used interactively by a user.  There are
>>> a couple 'fields' (as in "designated areas of the display")
>>> in which the relevant data is being displayed while the
>>> program is being used.  The acquired data needs to be
>>> entered into a mysql database, preferably as soon as
>>> possible.  (The program needs windoze, and the sources
>>> are unavailable :( )
>>>
>>>
>>> The idea is to make a screen recording and postprocess
>>> the recording with some sort of OCR software.  This might
>>> require using ffmpeg (or the like) to create a single
>>> image from each frame of the recording; then treat each
>>> image with an OCR software to get the interesting data
>>> which can then be entered into the database.
>>>
>>> Data to extract is mostly numbers.  The relevant fields
>>> can be expected to be either filled or empty.  The FPS rate
>>> of the recording can be kept reasonably low, like 1 FPS,
>>> or perhaps even less, depending on how frequent the relevant
>>> fields change.
>>>
>>> Using tesseract comes to mind, but after reading that
>>>
>>> "Tesseract's output will be very poor quality if the input
>>> images are not preprocessed to suit it: Images (especially
>>> screenshots) must be scaled up such that the text x-height
>>> is at least 20 pixels,[12] any rotation or skew must be
>>> corrected or no text will be recognized, low-frequency
>>> changes in brightness must be high-pass filtered, or
>>> Tesseract's binarization stage will destroy much of the
>>> page, and dark borders must be manually removed, or they
>>> will be misinterpreted as characters."[1]
>>>
>>> I'm even more doubtful that this would produce usable
>>> results with sufficient reliability.
>>>
>>> So what might be the best way to get text/numbers out of
>>> what a program displays?
>>>
>>>
>>> [1]: https://en.wikipedia.org/wiki/Tesseract_(software)
>>>
>>
>> I can't help with Gentoo.
>> Try to find an old (free) version of FineReader which runs under wine.
>> If you do it only occasionally, transfer the image to an Android phone
>> where there a good and cheap OCR apps, even FineReader.
> 
> It would be too much video to process.  Besides, phones are
> ok for making phone calls and entirely incompatible with
> computers, which makes them useless for anything else but
> making phone calls.


Huh? da fuck you talkin' 'bout?


My trusty collection of Android devices would be very surprised to hear
they now don't have real CPUs, wifi chips, RAM and storage. Or can't run
a web browser, do email, instant chat, play x264 video with less cpu
load than my 8 core laptop, share with smb on the network, do bluetooth,
video calls or any of the other bazzillion things computers have always
done with each other.

How odd. I really thought my Android phones could do all of that. I must
have imagined it  that means my delusions are worse than I thought
and maybe I need different and more pills from the nice lady who's my GP.



-- 
Alan McKinnon
alan.mckin...@gmail.com




Re: how to share a directory tree with files in it with multiple users (Re: [gentoo-user] local shared directory)

2016-05-07 Thread Alan McKinnon
On 07/05/2016 17:12, hw wrote:
> Michael Orlitzky schrieb:
>> On 04/23/2016 10:42 AM, hw wrote:
>>>
>>> Has it become entirely impossible to share a directory tree and the
>>> files in it with multiple users when Linux is involved?  This should be
>>> a very simple thing to accomplish.
>>>
>>
>> It was never possible. It's ridiculous, but there it is. The UNIX
>> permissions model is too simple. ACLs were bolted on top, but most tools
>> retain legacy behavior with respect to group masks that breaks default
>> ACLs. You're seeing that same problem with your Samba share.
>>
>> Filesystem permissions are one thing that Windows got right. There's
>> ongoing work to bring that model to Linux,
>>
>>https://en.wikipedia.org/wiki/Richacls
>>
>> but they're going to make the same mistake again[0] and allow the group
>> bits to act as a mask. That means mkdir, tar, cp, 7z -- anything that
>> tries to mess with group bits -- isn't going to work. They'll be DOA
>> just like POSIX ACLs were.
>>
>> I think you can manage this with incron and POSIX ACLs. Instead of
>> running "chmod g+w", use sys-apps/apply-default-acl to reset the
>> permissions to the defaults that you set.
>>
>> I wrote apply-default-acl to solve exactly this problem. You just need
>> to figure out a way to run it whenever things get screwed up. Which
>> means, whenever a file or directory is created.
>>
>>
>> [0] http://www.bestbits.at/richacl/man/richacl.7.txt
>>
>>   Changing the file mode permission bits:
>>
>>When changing the file mode permission bits with chmod(1), the
>>owner, group, and other file permission bits are set to the
>>permission bits in the new mode... In addition, the masked and
>>write_through ACL flags are set. This has the effect of limiting the
>>permissions granted by the ACL to the file mode  permission bits...
>>
>>
> 
> Hm, I'm confused.  Is it not possible to somehow force
> samba to set a user and a group as owners of a file or
> of a directory which is being created on a share?
> 
> If that was possible, couldn't I mount that share with
> the uid and gid of the owner and group samba enforces,
> which would then allow multiple local users to access
> the files and directories on that share as one?


Now you've added a whole new wrinkle that was never mentioned before -
samba. Yes, samba can enforce the permissions you want on file system
objects in shares it controls. To be accurate, it runs as root and
presents the perms you want to the user, but only when accessing the
files via samba. Look at these options in smb.conf

create mask = 664
force create mode = 664
security mask = 664
force security mode = 664
directory mask = 2775
force directory mode = 2775
directory security mask = 2775
force directory security mode = 2775

With this you can achieve what you want, but you have to ensure that
samba is the only way the users can access the files.

I'm assuming you completely and correctly understand umask.


-- 
Alan McKinnon
alan.mckin...@gmail.com




Re: how to share a directory tree with files in it with multiple users (Re: [gentoo-user] local shared directory)

2016-05-07 Thread hw

Michael Orlitzky schrieb:

On 04/23/2016 10:42 AM, hw wrote:


Has it become entirely impossible to share a directory tree and the
files in it with multiple users when Linux is involved?  This should be
a very simple thing to accomplish.



It was never possible. It's ridiculous, but there it is. The UNIX
permissions model is too simple. ACLs were bolted on top, but most tools
retain legacy behavior with respect to group masks that breaks default
ACLs. You're seeing that same problem with your Samba share.

Filesystem permissions are one thing that Windows got right. There's
ongoing work to bring that model to Linux,

   https://en.wikipedia.org/wiki/Richacls

but they're going to make the same mistake again[0] and allow the group
bits to act as a mask. That means mkdir, tar, cp, 7z -- anything that
tries to mess with group bits -- isn't going to work. They'll be DOA
just like POSIX ACLs were.

I think you can manage this with incron and POSIX ACLs. Instead of
running "chmod g+w", use sys-apps/apply-default-acl to reset the
permissions to the defaults that you set.

I wrote apply-default-acl to solve exactly this problem. You just need
to figure out a way to run it whenever things get screwed up. Which
means, whenever a file or directory is created.


[0] http://www.bestbits.at/richacl/man/richacl.7.txt

  Changing the file mode permission bits:

   When changing the file mode permission bits with chmod(1), the
   owner, group, and other file permission bits are set to the
   permission bits in the new mode... In addition, the masked and
   write_through ACL flags are set. This has the effect of limiting the
   permissions granted by the ACL to the file mode  permission bits...




Hm, I'm confused.  Is it not possible to somehow force
samba to set a user and a group as owners of a file or
of a directory which is being created on a share?

If that was possible, couldn't I mount that share with
the uid and gid of the owner and group samba enforces,
which would then allow multiple local users to access
the files and directories on that share as one?




Re: [gentoo-user] extracting text, numbers from screencasts

2016-05-07 Thread hw

Urs Schütz schrieb:

On 04/08/16 11:30, Helmut Jarausch wrote:

On 04/08/2016 03:26:53 PM, hw wrote:


Hi,

what would be the best approach to extract data
from a screencast?

The task is to acquire some data from the display of
a GUI program used interactively by a user.  There are
a couple 'fields' (as in "designated areas of the display")
in which the relevant data is being displayed while the
program is being used.  The acquired data needs to be
entered into a mysql database, preferably as soon as
possible.  (The program needs windoze, and the sources
are unavailable :( )


The idea is to make a screen recording and postprocess
the recording with some sort of OCR software.  This might
require using ffmpeg (or the like) to create a single
image from each frame of the recording; then treat each
image with an OCR software to get the interesting data
which can then be entered into the database.

Data to extract is mostly numbers.  The relevant fields
can be expected to be either filled or empty.  The FPS rate
of the recording can be kept reasonably low, like 1 FPS,
or perhaps even less, depending on how frequent the relevant
fields change.

Using tesseract comes to mind, but after reading that

"Tesseract's output will be very poor quality if the input
images are not preprocessed to suit it: Images (especially
screenshots) must be scaled up such that the text x-height
is at least 20 pixels,[12] any rotation or skew must be
corrected or no text will be recognized, low-frequency
changes in brightness must be high-pass filtered, or
Tesseract's binarization stage will destroy much of the
page, and dark borders must be manually removed, or they
will be misinterpreted as characters."[1]

I'm even more doubtful that this would produce usable
results with sufficient reliability.

So what might be the best way to get text/numbers out of
what a program displays?


[1]: https://en.wikipedia.org/wiki/Tesseract_(software)



I can't help with Gentoo.
Try to find an old (free) version of FineReader which runs under wine.
If you do it only occasionally, transfer the image to an Android phone
where there a good and cheap OCR apps, even FineReader.





I had some surprisingly good experience with tesseact in digitizing 
photographed pages of an old book recently. So I gave it a try today with a 
cropped screenshot of thunderbird.

$ convert scrsht.png -type Grayscale -filter point -resize 300% -normalize 
upscaled.png
$ tesseract -l eng upscaled.png out
$ less out.txt

convert is from media-gfx/imagemagick-6.9.0.3
tesseract is app-text/tesseract-3.04.00-r2

Here are my findings:
Any graphical elements sized similar to an character appear as strange letters.
Recognition of serif fonts was better than sans-serif fonts, even at smaller 
font size.
Text which can be spell-checked was nearly perfectly recognized.
Gentoo-specific words like "GLSA" and "NVMe" was not correctly recognized.
Selected text (white on blue background) was poorly recognized.
Dates were not recognized correctly.
Times were correctly read.
"convert" time for a initial screenshot size of 956 x 639 pixels was 0.4 
seconds.
"tesseract" time was a little more than 6s on an Intel(R) Core(TM) i7-4710MQ 
CPU @ 2.50GHz, without opencl.
The image conversion and tesseract ocr could easily be scripted.


Considering the amount of video, 6s per frame would be too long.
The application is time-critical such that I have a window of about
10s to extract and to process the data from at least 8 video streams.
Recording at only 10 FPS and taking 8 seconds to extract and to
process the data would require 640s per 10s window, and I don't have
about 70 CPUs available to do the work.  To make things worse, it's
an ongoing process, i. e. dividing it into 10s windows is too artificial
to keep things running as smoothly as they should.


In short I would say that the following steps would help with tesseract:
Avoid GUI with a lot of graphics.
Try to screenshot just the relevant areas.
Increase GUI font size.
Configure GUI to use a well known serif font, or train tesseract for the 
specific font used.
Configure GUI to use high contrasts, avoid colors which get converted to gray.
Tesseract time could be improved by enabling opencl.

I would be interested to hear about your findings with numerical data, and 
which approach finally works for you.


Thank you very much for giving me a better idea of what I'm looking at!
Considering it, I have resorted to use autohotkey, which has the ability
to actually read data from GUI-elements.  It also can make requests to
web servers.  With that, things become a hell of a lot simpler than
trying to process video streams, for I can simply read the data and send
it over to the web server which puts it into the database where it needs
to end up anyway.

Unfortunately, the application the data is being read from has a bad
habit of renaming the GUI-elements I need to read.  This makes things
difficult again.

Autohotkey is a really nice tool, 

[gentoo-user] Re: Will installing grub-2.02 break my grub-0.97 setup?

2016-05-07 Thread Grant Edwards
On 2016-05-06, James  wrote:
> Grant Edwards  gmail.com> writes:
>
>> I'd like to to install winusb, and it appears to depend on grub-2:
>>   $ sudo emerge -av winusb
>
> Ok, so I've never used winusb, so excuse me for asking a few dumb
> questions here. Even after reading a bit and searching around, I
> have these dumb questions. I did not find sufficient reading
> materials to 'turn the light on' as to when and why and how this
> winusb is used.
>
> 1. So winusb can put a window (vista-->8) image on a usbstick that will
>boot most x86 orx86-64 hardware with the appropriate windows binary? 
>The hardware can then be installed with the windows image?

That's my understanding.  [I haven't actually done it yet.]

Many of the machines I use no longer have (working) optical
drives. When doing OS installs I almost always use USB flash drives.
I've been doing Linux installs that way for yonks. Most Linux OS
distro .iso images are already "hybrid" so they boot as-is from a
block storage device.  In my experience, those that aren't can be
fixed up with a simple "isohybrid" command.

Now I want to stop buring Windows DVDs.

> 2. winusb can be used as a live_windows on a linux system where
>changes are retain on the usb stick?

No, I don't think so.

> 3. winusb can be used to install windows in a VM?

Presumably -- if you can boot the VM from a USB storage device.

> 4. winusb can be used to install windows in a container?

I don't know enough about containers to posit an answer.

--
Grant






[gentoo-user] Re: Will installing grub-2.02 break my grub-0.97 setup?

2016-05-07 Thread Grant Edwards
On 2016-05-06, Neil Bothwick  wrote:
> On Fri, 6 May 2016 16:21:28 + (UTC), Grant Edwards wrote:
>
>> >> Thanks.  That's good to know -- I'll definitely set things up so I'm
>> >> not running winusb as root.  
>> >
>> > Well, you could always reinstall grub-0 before rebooting, to make
>> > sure.  
>> 
>> I just created a systemsrescuecd bootable USB flash drive that can be
>> be used to re-install grub-0 in the MBR if something does go wrong.
>> But, running winusb as a non-privlidged user should prevent any
>> collateral damage to the MBR.
>
> It should also prevent winusb writing to the MBR of the USB stick,
> which sort of defeats the point.

Nope.  I have my system configured so that my USB flash drives are
writable for users in the group "usb" -- of which I am one.

--
Grant




Re: [gentoo-user] extracting text, numbers from screencasts

2016-05-07 Thread hw

Helmut Jarausch schrieb:

On 04/08/2016 03:26:53 PM, hw wrote:


Hi,

what would be the best approach to extract data
from a screencast?

The task is to acquire some data from the display of
a GUI program used interactively by a user.  There are
a couple 'fields' (as in "designated areas of the display")
in which the relevant data is being displayed while the
program is being used.  The acquired data needs to be
entered into a mysql database, preferably as soon as
possible.  (The program needs windoze, and the sources
are unavailable :( )


The idea is to make a screen recording and postprocess
the recording with some sort of OCR software.  This might
require using ffmpeg (or the like) to create a single
image from each frame of the recording; then treat each
image with an OCR software to get the interesting data
which can then be entered into the database.

Data to extract is mostly numbers.  The relevant fields
can be expected to be either filled or empty.  The FPS rate
of the recording can be kept reasonably low, like 1 FPS,
or perhaps even less, depending on how frequent the relevant
fields change.

Using tesseract comes to mind, but after reading that

"Tesseract's output will be very poor quality if the input
images are not preprocessed to suit it: Images (especially
screenshots) must be scaled up such that the text x-height
is at least 20 pixels,[12] any rotation or skew must be
corrected or no text will be recognized, low-frequency
changes in brightness must be high-pass filtered, or
Tesseract's binarization stage will destroy much of the
page, and dark borders must be manually removed, or they
will be misinterpreted as characters."[1]

I'm even more doubtful that this would produce usable
results with sufficient reliability.

So what might be the best way to get text/numbers out of
what a program displays?


[1]: https://en.wikipedia.org/wiki/Tesseract_(software)



I can't help with Gentoo.
Try to find an old (free) version of FineReader which runs under wine.
If you do it only occasionally, transfer the image to an Android phone where 
there a good and cheap OCR apps, even FineReader.


It would be too much video to process.  Besides, phones are
ok for making phone calls and entirely incompatible with
computers, which makes them useless for anything else but
making phone calls.




Re: [gentoo-user] Calm

2016-05-07 Thread Matthew Marchese

On 4/18/2016 3:32 PM, Marc Joliet wrote:

On Saturday 16 April 2016 14:48:51 Alan Mackenzie wrote:

Hello, Gentoo.

I'm just saying hello to confirm I'm still here.

For many months now, Gentoo has simply worked for me, without problems.
I sync my system several times a week, and emerge just works.

The last bit of excitement I had was in early 2015 when I was trying to
sort out the mess in my xfce4 system after gnome-3 had been made stable.
In the end, I gave up and reinstalled Gentoo, which this time took me
only a week.

Admittedly, there's very little which is cutting edge on my system - the
box is 6½ years old, it boots with lilo on an old fashioned BIOS, my
filesystems are ext3 (or in one case, ext2) on spinning rust.  The only
remotely adventurous things I've got are RAID-1 (via the kernel) and
lvm2.

So a big thanks to all the developers who've brought about this happy
state of affairs!

I concur!


The first three entries in this thread, and the last few are what keep
guys like me encouraged. Loved reading about all your success stories.
Thanks for sharing! :)
-maffblaster