RE: [backstage-developer] Nice bit of crowd-sourcing

John O'Donovan Fri, 19 Jun 2009 09:26:42 -0700

The Telegraph are (allegedly) producing a large supplement explaining
some of the redacted details...
 
I can neither confirm or deny this information.
 
Cheers,



                                          
John O'Donovan
Chief Technical Architect 

BBC Future Media & Technology (Journalism)
BC3 C1, Broadcast Centre, 201 Wood Lane, London 

http://news.bbc.co.uk/ 
http://news.bbc.co.uk/sport/ 
http://news.bbc.co.uk/weather/ 

 

________________________________

From: [email protected]
[mailto:[email protected]] On Behalf Of Alex
Mace
Sent: 19 June 2009 16:30
To: [email protected]
Subject: Re: [backstage-developer] Nice bit of crowd-sourcing


I presume someone has tried the old trick of extracting the pictures
from PDF to make sure they didn't just apply the redaction over the top
of the pictures? Sounds unlikely, but it's been done before... 

On 19 Jun 2009, at 16:23, Brian Butterworth wrote:


        
        
        
        2009/6/19 John O'Donovan <[email protected]>
        

                We did something similar, but it's the low tech
version... :o)
                

                http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm
<http://news.bbc.co.uk/1/hi/uk_politics/8106044.stm>  

                with things people have found being published here...

                

                http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm
<http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm>
<http://news.bbc.co.uk/1/hi/uk_politics/8106650.stm>  

                I like what the Guardian have done with this - been
playing with it...


        After an hour I start wondering about using OCR software...

        Does anyone know of a command line OCR tool that I could use?
Something that works with PHP perhaps?

        As it seems very easy to get at the images from the
http://mps-expenses.guardian.co.uk/page/X
<http://mps-expenses.guardian.co.uk/page/194884/> / pages as there is
only one image in the whole document.  Running OCR will generate lots of
crud, but it could be matched against the human input to act as
validation.

         

                
                
                Cheers,
                
                

                                                          
                John O'Donovan
                Chief Technical Architect 
                
                BBC Future Media & Technology (Journalism)
                BC3 C1, Broadcast Centre, 201 Wood Lane, London 

                http://news.bbc.co.uk/ 
                http://news.bbc.co.uk/sport/ 
                http://news.bbc.co.uk/weather/ 

                 

________________________________

                From: [email protected]
[mailto:[email protected]] On Behalf Of Brian
Butterworth
                Sent: 19 June 2009 09:46
                To: [email protected]
                Subject: [backstage-developer] Nice bit of
crowd-sourcing
                
                
                Nice bit of crowd-sourcing I think here: 
                
                
                http://mps-expenses.guardian.co.uk/
<http://mps-expenses.guardian.co.uk/> 
                
                
                Shame the app's not AJAX, would been easier to use that
way, but generally a great way of checking 77252 pages of documents.
                
                
                Kind-of-wondering why Auntie didn't do it first, but...
                
                
                All the best
                
                
                Brian Butterworth




        -- 
        
        Brian Butterworth
        
        follow me on twitter: http://twitter.com/briantist
        web: http://www.ukfree.tv - independent digital television and
switchover advice, since 2002

RE: [backstage-developer] Nice bit of crowd-sourcing

Reply via email to