On Friday 16 January 2009 17:30:26 John Carter wrote:
> On Fri, 16 Jan 2009, Craig Falconer wrote:
> > David Lowe wrote, On 16/01/09 15:46:
> > Then you can never share this, because it would be redistribution of
> > copyright material :-\
>
> Actually I suspect even an attack lawyer may have a hard time
> identifying what is copyrighted in a text file of (word,yyyy-mm-dd) pairs.
>
> A very brief attempt (2 seconds) with gocr didn't spit out anything
> readable. I suspect one actually needs to (Gasp! Schlock! Horror!)
> read the man page and tweak options.

Tried Tesseract?
http://sourceforge.net/projects/tesseract-ocr/
http://www.linux.com/articles/57222

My exp. from some time ago was that it is  very good indeed provided the text 
image is to its liking. ( See the refs. above)

imho, ( and I'm not a lawyer )  you are making a specialised index to images 
on the web. Not copies of the images. Google do that every day by the tens of 
millions. They have no legal problems, so why should you? After all what is 
good for the Corporate is also good for the Little Fellow, because we live 
under a Common Law jurisdiction.

-- 
With Sincerity,
Christopher Sawtell

Reply via email to