2009/6/19 John O'Donovan <[email protected]>

>  We did something similar, but it's the low tech version... :o)
>
> *http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm*<http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm>
>
> with things people have found being published here...
>
> http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm**<http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm>
>
> I like what the Guardian have done with this - been playing with it...
>

After an hour I start wondering about using OCR software...

Does anyone know of a command line OCR tool that I could use?  Something
that works with PHP perhaps?

As it seems very easy to get at the images from the
http://mps-expenses.guardian.co.uk/page/X<http://mps-expenses.guardian.co.uk/page/194884/>/
pages as there is only one image in the whole document.  Running OCR will
generate lots of crud, but it could be matched against the human input to
act as validation.



> Cheers,
>
> *                                          ***
> *John O'Donovan*
> Chief Technical Architect**
>
> *BBC Future Media & Technology (Journalism)
> *BC3 C1, Broadcast Centre, 201 Wood Lane, London
>
> http://news.bbc.co.uk/
> http://news.bbc.co.uk/sport/
> http://news.bbc.co.uk/weather/
>
>
>  ------------------------------
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Brian
> Butterworth
> *Sent:* 19 June 2009 09:46
> *To:* [email protected]
> *Subject:* [backstage-developer] Nice bit of crowd-sourcing
>
> Nice bit of crowd-sourcing I think here:
> http://mps-expenses.guardian.co.uk/
>
> Shame the app's not AJAX, would been easier to use that way, but generally
> a great way of checking 77252 pages of documents.
>
> Kind-of-wondering why Auntie didn't do it first, but...
>
> All the best
>
> Brian Butterworth
>



-- 

Brian Butterworth

follow me on twitter: http://twitter.com/briantist
web: http://www.ukfree.tv - independent digital television and switchover
advice, since 2002

Reply via email to