I'll second exiftool. It is great for this sort of thing.

Edward

On Mon, May 12, 2014 at 6:30 PM, Reser, Gregory <[email protected]> wrote:
> You might try http://www.sno.phy.queensu.ca/~phil/exiftool/ , a Perl library 
> to read and write embedded metadata.
>
> Greg Reser
> UC San Diego Library
> 9500 Gilman Drive, 0175K
> La Jolla, CA 92093-0175
>
> Phone: 858.246.0998
> Skype: gregreser
>
>
>
> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of 
> Stuart Yeates
> Sent: Monday, May 12, 2014 3:26 PM
> To: [email protected]
> Subject: Re: [CODE4LIB] Extracting Text From .tiff Files
>
> Your first step is to pin down the format. TIFF is a container form (like 
> zip) and can contain pretty much anything. Likely candidates for you format 
> include https://en.wikipedia.org/wiki/IPTC_Information_Interchange_Model and 
> https://en.wikipedia.org/wiki/Extensible_Metadata_Platform
>
> Your second step is to find a library / tool for your platform that supports 
> your format.
>
> Cheers
> stuart
>
> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of Gavin 
> Spomer
> Sent: Tuesday, 13 May 2014 10:01 a.m.
> To: [email protected]
> Subject: [CODE4LIB] Extracting Text From .tiff Files
>
> Hello folks,
>
> I'm in the process of migrating a student newspaper collection, currently 
> implemented with ResCarta, into our new bepress institutional repository. 
> ResCarta has each page of a newspaper stored as a tiff file. Not only does 
> the tiff file contain the graphics data, but it has some metadata in xml 
> format and the fulltext of the page. I know this because I opened up some of 
> the tiffs with a plain-text editor (Vim).
>
> Although I can see the text in the file, I've only been about 90% accurate in 
> extracting it with a script. Some of those "weird" characters seem to do some 
> wonky things when doing file IO for some reason. Is there a more reliable way 
> to extract text stored in a tiff file? I've Googled and Googled and have 
> pulled up almost nothing. But there's got to be a way, since ResCarta stores 
> it there and can extract it.
>
> Any ideas?
> Gavin Spomer
> Systems Programmer
> Brooks Library
> Central Washington University

Reply via email to