Alex, It was worth asking, but the redactions are part of the bitmap, not the overlay.
2009/6/19 Alex Mace <[email protected]> > I presume someone has tried the old trick of extracting the pictures from > PDF to make sure they didn't just apply the redaction over the top of the > pictures? Sounds unlikely, but it's been done before... > > On 19 Jun 2009, at 16:23, Brian Butterworth wrote: > > > > 2009/6/19 John O'Donovan <[email protected]> > >> We did something similar, but it's the low tech version... :o) >> >> *http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm*<http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm> >> >> with things people have found being published here... >> >> http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm**<http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm> >> >> I like what the Guardian have done with this - been playing with it... >> > > After an hour I start wondering about using OCR software... > > Does anyone know of a command line OCR tool that I could use? Something > that works with PHP perhaps? > > As it seems very easy to get at the images from the > http://mps-expenses.guardian.co.uk/page/X<http://mps-expenses.guardian.co.uk/page/194884/>/ > pages as there is only one image in the whole document. Running OCR will > generate lots of crud, but it could be matched against the human input to > act as validation. > > > >> >> Cheers, >> >> * *** >> *John O'Donovan* >> Chief Technical Architect** >> >> *BBC Future Media & Technology (Journalism) >> *BC3 C1, Broadcast Centre, 201 Wood Lane, London >> >> http://news.bbc.co.uk/ >> http://news.bbc.co.uk/sport/ >> http://news.bbc.co.uk/weather/ >> >> >> ------------------------------ >> *From:* [email protected] [mailto: >> [email protected]] *On Behalf Of *Brian >> Butterworth >> *Sent:* 19 June 2009 09:46 >> *To:* [email protected] >> *Subject:* [backstage-developer] Nice bit of crowd-sourcing >> >> Nice bit of crowd-sourcing I think here: >> http://mps-expenses.guardian.co.uk/ >> >> Shame the app's not AJAX, would been easier to use that way, but generally >> a great way of checking 77252 pages of documents. >> >> Kind-of-wondering why Auntie didn't do it first, but... >> >> All the best >> >> Brian Butterworth >> > > > > -- > > Brian Butterworth > > follow me on twitter: http://twitter.com/briantist > web: http://www.ukfree.tv - independent digital television and switchover > advice, since 2002 > > > -- Brian Butterworth follow me on twitter: http://twitter.com/briantist web: http://www.ukfree.tv - independent digital television and switchover advice, since 2002

