I'll second exiftool. It is great for this sort of thing. Edward
On Mon, May 12, 2014 at 6:30 PM, Reser, Gregory <[email protected]> wrote: > You might try http://www.sno.phy.queensu.ca/~phil/exiftool/ , a Perl library > to read and write embedded metadata. > > Greg Reser > UC San Diego Library > 9500 Gilman Drive, 0175K > La Jolla, CA 92093-0175 > > Phone: 858.246.0998 > Skype: gregreser > > > > -----Original Message----- > From: Code for Libraries [mailto:[email protected]] On Behalf Of > Stuart Yeates > Sent: Monday, May 12, 2014 3:26 PM > To: [email protected] > Subject: Re: [CODE4LIB] Extracting Text From .tiff Files > > Your first step is to pin down the format. TIFF is a container form (like > zip) and can contain pretty much anything. Likely candidates for you format > include https://en.wikipedia.org/wiki/IPTC_Information_Interchange_Model and > https://en.wikipedia.org/wiki/Extensible_Metadata_Platform > > Your second step is to find a library / tool for your platform that supports > your format. > > Cheers > stuart > > -----Original Message----- > From: Code for Libraries [mailto:[email protected]] On Behalf Of Gavin > Spomer > Sent: Tuesday, 13 May 2014 10:01 a.m. > To: [email protected] > Subject: [CODE4LIB] Extracting Text From .tiff Files > > Hello folks, > > I'm in the process of migrating a student newspaper collection, currently > implemented with ResCarta, into our new bepress institutional repository. > ResCarta has each page of a newspaper stored as a tiff file. Not only does > the tiff file contain the graphics data, but it has some metadata in xml > format and the fulltext of the page. I know this because I opened up some of > the tiffs with a plain-text editor (Vim). > > Although I can see the text in the file, I've only been about 90% accurate in > extracting it with a script. Some of those "weird" characters seem to do some > wonky things when doing file IO for some reason. Is there a more reliable way > to extract text stored in a tiff file? I've Googled and Googled and have > pulled up almost nothing. But there's got to be a way, since ResCarta stores > it there and can extract it. > > Any ideas? > Gavin Spomer > Systems Programmer > Brooks Library > Central Washington University
