Re: [CODE4LIB] tiff2pdf, then back to pdf?
On Sat, Apr 27, 2013 at 9:37 PM, Andrew Hankinson andrew.hankin...@gmail.com wrote: As someone who works on document recognition, I have to disagree. You should always keep an uncompressed original around, since you can never recover it without (often expensive) re-imaging. JPEG, or any other type of lossy compression, introduces artifacts that don't look too bad by the human eye, but have a significant effect on the quality of OCR. You can never recover this after you have discarded your originals. Big files are clunky to work with, which is why you should have an automated way of producing surrogate, compressed copies for general use, but like any archivist will tell you, a photocopy is not a replacement for the original. All true, but keeping just in case copies of uncompressed files around has significant disadvantages unless you have the resources to deal with them. Any archivist will tell you they need the uncompressed files. However, many of them don't have the disk space, bandwidth, staff resources, etc to deal with these files and wind up doing things that are far more dangerous like just having files sitting around on cheap external HD's. Every choice people make is about loss. Equipment, optics, lighting, you name it. But for some reason, the instant we're talking about bits of data on a disk, people plan as though capacity were unlimited when most archives are severely underresourced. If you only have to deal with a few small projects, keeping uncompressed images is no big deal. But let's suppose you have a million pages or more -- this introduces a completely different cost structure that permanently affects what resources you'll have for other projects in the future. Objectives and available resources need to drive decisions unless we believe that the best plan is to do what we'd do in an ideal world until resources run out. kyle
Re: [CODE4LIB] tiff2pdf, then back to pdf?
On Sun, Apr 28, 2013 at 2:43 AM, Kyle Banerjee kyle.baner...@gmail.comwrote: Every choice people make is about loss. Equipment, optics, lighting, you name it. But for some reason, the instant we're talking about bits of data on a disk, people plan as though capacity were unlimited when most archives are severely underresourced. Strictly speaking, it is not correct to say that every choice is about loss (or cost), and for once I'm saying this in a case where the difference is actually significant. [Someone help edsu to the fainting couch.] If a particular set of choices are below the Production Possibility Frontierhttp://en.wikipedia.org/wiki/Production%E2%80%93possibility_frontier, then those choices are strictly inferior to those that are on the Frontier. Why is this relevant? Because, for the situation where lossless image storage has a very value, TIFF is not the most space efficient way of storing the data. A month or so ago I did a few measurements, using a (not necessarily representative) color photograph (TIFF extracted from a Canon EOS-10D raw). For lossless conversion, I used uncompressed TIFF, compressed TIFF, PNG, and JP2 (100% quality). Measurements using the ImageMagick compare utility confirmed zero signal loss: -rw-r--r--@ 1 ses staff18M Mar 19 14:52 CRW_4237_tiff_8_uncompressed.tif -rw-r--r--@ 1 ses staff 9.4M Mar 19 14:53 CRW_4237_tiff_8_compressed.tif -rw-r--r-- 1 ses staff 8.2M Mar 19 14:29 CRW_4237-0.png -rw-r--r--@ 1 ses staff 6.1M Mar 19 14:03 CRW_4237_quality_100-0.jp2 For lossy compression, using RMSE as the metric, we can see that JPEG at 90% quality is showing measurable signal degradation, with a compression ratio of 4.7:1 relative to the JP2 file (vs. 14:1 relative to uncompressed tiff, and 7.2:1 for compressed). $compare ... CRW_4237_jpg_90.jpg =459.806 (0.00701619) [1.3M] (4.7:1) JP2 at quality 75 showed slightly less signal loss by RMSE, with a compression ratio of 5.5 : 1 $compare ... CRW_4237_quality_75.jp2 =457.959 (0.006988) [1.1M] (5.5:1) Note that the image type was a color photograph; other image types may get better lossless compression using PNG or TIFF. Also, some people have expressed concern over the use of JP2 for archival purposes due to a relatively small number of open-source libraries. On the other hand, JP2 has some potentially useful properties for distributed replicated preservation (layers with fine levels of detail could be split off and stored on fewer replicas). Simon
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Sure, but these are especially small photos -- are these born digital? Lossless scans of pretty small photos are frequently well over 100MB, and it takes hardly anything to get a 1GB scan. It costs a fortunate to treat a thesis (i.e. what's really being preserved is readable text rather than the artifact itself) as if it were a historical document where texture of the paper and the like is actually relevant. The problem we encounter is that people want to scan a tiny photo and blow it up to poster or wall size. But the source material and the equipment are typically far more limiting factors than the format unless you're doing something that's totally nuts. You want maximum flexibility for unanticipated future needs, but value judgments need to be made up front or you wind up painting yourself into a corner. kyle On Sun, Apr 28, 2013 at 12:40 PM, Simon Spero sesunc...@gmail.com wrote: On Sun, Apr 28, 2013 at 2:43 AM, Kyle Banerjee kyle.baner...@gmail.com wrote: Every choice people make is about loss. Equipment, optics, lighting, you name it. But for some reason, the instant we're talking about bits of data on a disk, people plan as though capacity were unlimited when most archives are severely underresourced. Strictly speaking, it is not correct to say that every choice is about loss (or cost), and for once I'm saying this in a case where the difference is actually significant. [Someone help edsu to the fainting couch.] If a particular set of choices are below the Production Possibility Frontier http://en.wikipedia.org/wiki/Production%E2%80%93possibility_frontier, then those choices are strictly inferior to those that are on the Frontier. Why is this relevant? Because, for the situation where lossless image storage has a very value, TIFF is not the most space efficient way of storing the data. A month or so ago I did a few measurements, using a (not necessarily representative) color photograph (TIFF extracted from a Canon EOS-10D raw). For lossless conversion, I used uncompressed TIFF, compressed TIFF, PNG, and JP2 (100% quality). Measurements using the ImageMagick compare utility confirmed zero signal loss: -rw-r--r--@ 1 ses staff18M Mar 19 14:52 CRW_4237_tiff_8_uncompressed.tif -rw-r--r--@ 1 ses staff 9.4M Mar 19 14:53 CRW_4237_tiff_8_compressed.tif -rw-r--r-- 1 ses staff 8.2M Mar 19 14:29 CRW_4237-0.png -rw-r--r--@ 1 ses staff 6.1M Mar 19 14:03 CRW_4237_quality_100-0.jp2 For lossy compression, using RMSE as the metric, we can see that JPEG at 90% quality is showing measurable signal degradation, with a compression ratio of 4.7:1 relative to the JP2 file (vs. 14:1 relative to uncompressed tiff, and 7.2:1 for compressed). $compare ... CRW_4237_jpg_90.jpg =459.806 (0.00701619) [1.3M] (4.7:1) JP2 at quality 75 showed slightly less signal loss by RMSE, with a compression ratio of 5.5 : 1 $compare ... CRW_4237_quality_75.jp2 =457.959 (0.006988) [1.1M] (5.5:1) Note that the image type was a color photograph; other image types may get better lossless compression using PNG or TIFF. Also, some people have expressed concern over the use of JP2 for archival purposes due to a relatively small number of open-source libraries. On the other hand, JP2 has some potentially useful properties for distributed replicated preservation (layers with fine levels of detail could be split off and stored on fewer replicas). Simon
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Yes, exactly. You will loose some of the image quality. If you change to a compressed format, then back to the TIFF, you can get the format, but you can't go back to the original file. Stop and think: What are your long term goals? Big files are clunky to work with. I'm guessing that's why you don't want TIFF. In my experience, files big enough to be clunky are discarded within a few years, regardless of the intentions when they were prepped. If you want to avoid big files, then your best bet is to assess and test the file you will actually keep and do the best job you can with it. So, if you want to rerun OCR in a few years when the recognition will be better, then make your PDFs in such a way that you can get decent OCR out of them today, and plan to rerun on those files, not the (discarded) originals. Don't think reformatting will get you any better image quality later. -Wilhelmina Randtke On Fri, Apr 26, 2013 at 3:19 PM, James Gilbert gilber...@whitehallpl.orgwrote: I'm by no means an expert in the math behind image format conversions... but: When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is compressed format. When back converting, wouldn't the original quality of TIFF would be lost, converted only to the quality of the last JPG (with degradation on each time this process occurs)? James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4339 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Sent: Friday, April 26, 2013 4:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? If you can stand an extrastep, Ed, there are tools to convert PDF to jpg images, and from there it shouldn't be too hard to get TIFF output. Do a search for convert PDF to image to get started. There are tools that are not online only, which I'm pretty sure is what you're after. Roy Zimmer Western Michigan University On 4/26/2013 4:08 PM, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
As someone who works on document recognition, I have to disagree. You should always keep an uncompressed original around, since you can never recover it without (often expensive) re-imaging. JPEG, or any other type of lossy compression, introduces artifacts that don't look too bad by the human eye, but have a significant effect on the quality of OCR. You can never recover this after you have discarded your originals. Big files are clunky to work with, which is why you should have an automated way of producing surrogate, compressed copies for general use, but like any archivist will tell you, a photocopy is not a replacement for the original. -Andrew On 2013-04-27, at 7:17 PM, Wilhelmina Randtke rand...@gmail.com wrote: Yes, exactly. You will loose some of the image quality. If you change to a compressed format, then back to the TIFF, you can get the format, but you can't go back to the original file. Stop and think: What are your long term goals? Big files are clunky to work with. I'm guessing that's why you don't want TIFF. In my experience, files big enough to be clunky are discarded within a few years, regardless of the intentions when they were prepped. If you want to avoid big files, then your best bet is to assess and test the file you will actually keep and do the best job you can with it. So, if you want to rerun OCR in a few years when the recognition will be better, then make your PDFs in such a way that you can get decent OCR out of them today, and plan to rerun on those files, not the (discarded) originals. Don't think reformatting will get you any better image quality later. -Wilhelmina Randtke On Fri, Apr 26, 2013 at 3:19 PM, James Gilbert gilber...@whitehallpl.orgwrote: I'm by no means an expert in the math behind image format conversions... but: When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is compressed format. When back converting, wouldn't the original quality of TIFF would be lost, converted only to the quality of the last JPG (with degradation on each time this process occurs)? James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4339 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Sent: Friday, April 26, 2013 4:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? If you can stand an extrastep, Ed, there are tools to convert PDF to jpg images, and from there it shouldn't be too hard to get TIFF output. Do a search for convert PDF to image to get started. There are tools that are not online only, which I'm pretty sure is what you're after. Roy Zimmer Western Michigan University On 4/26/2013 4:08 PM, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
[CODE4LIB] tiff2pdf, then back to pdf?
Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
If you can stand an extrastep, Ed, there are tools to convert PDF to jpg images, and from there it shouldn't be too hard to get TIFF output. Do a search for convert PDF to image to get started. There are tools that are not online only, which I'm pretty sure is what you're after. Roy Zimmer Western Michigan University On 4/26/2013 4:08 PM, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Image Magick can do it, you need Ghost Script installed though. I'Ve done this with multi layer TIFs and multi page PDFs. -mike ___ Michael Friscia Manager, Digital Library Programming Services Yale University Library (203) 432-1856 On 4/26/13 4:08 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
I'm by no means an expert in the math behind image format conversions... but: When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is compressed format. When back converting, wouldn't the original quality of TIFF would be lost, converted only to the quality of the last JPG (with degradation on each time this process occurs)? James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4339 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Sent: Friday, April 26, 2013 4:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? If you can stand an extrastep, Ed, there are tools to convert PDF to jpg images, and from there it shouldn't be too hard to get TIFF output. Do a search for convert PDF to image to get started. There are tools that are not online only, which I'm pretty sure is what you're after. Roy Zimmer Western Michigan University On 4/26/2013 4:08 PM, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Yes, converting from JPG to TIFF would have the quality of a JPG and (I believe) the file size of a TIFF. On 4/26/13 4:19 PM, James Gilbert wrote: I'm by no means an expert in the math behind image format conversions... but: When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is compressed format. When back converting, wouldn't the original quality of TIFF would be lost, converted only to the quality of the last JPG (with degradation on each time this process occurs)? James Gilbert, BS, MLIS Systems Librarian Whitehall Township Public Library 3700 Mechanicsville Road Whitehall, PA 18052 610-432-4339 ext: 203 -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Roy Sent: Friday, April 26, 2013 4:15 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? If you can stand an extrastep, Ed, there are tools to convert PDF to jpg images, and from there it shouldn't be too hard to get TIFF output. Do a search for convert PDF to image to get started. There are tools that are not online only, which I'm pretty sure is what you're after. Roy Zimmer Western Michigan University On 4/26/2013 4:08 PM, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward -- Steve Cherry Electronic Services Librarian The Catholic University of America 202-319-6433
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Hi, you'll notice from the language you use to describe your use case, that you use the word convert to describe what you're doing to the original TIFF images. Once you're done producing a derivative from those TIFFs, the only way back to the original TIFFs is to go back to the actual originals. The TIFF images are not stored in the PDF. Only way to go back to the originals is to preserve them. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ Do you love it? Do you hate it? There it is, the way you made it. --Frank Zappa On 4/26/13 3:08 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.eduwrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Actually, I'm mistaken. It didn't ever work. :-(. I do get a tiff, but not the original. I looked at the wrong files. On Fri, Apr 26, 2013 at 5:11 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.eduwrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
What's your use case in this scenario? Do you want to provide access to the PDFs over the web or are you using them as your archival format? You probably don't want to use PDF to achieve both objectives. Ethan On Apr 26, 2013 5:11 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.edu wrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Hardy, You may very well be correct, but some programs claim to keep the original image data unaltered [1], so I was hoping that was the case (basically it would put some sort of wrapper around the tiff. Tiff2pdf on my Ubuntu box seems to keep the file sizes very close when I use it so, I'm thinking it still might be possible. But then again, it might not be and it might depend o the features of the tiff file (and what pdf version) that is being used. If I can't do it, I'll figure something else out, but it would make my life easier to have to deal with only one file for each representation. But, I'll live regardless :-) Edward [1] http://www.davince.com/docs/tiff2pdf.html is one example of a program that says this, but it also does point out not all features of tiff are supported in pdf. It is also old, and they don't offer a program that I can find that does the reversal. On Fri, Apr 26, 2013 at 5:07 PM, Pottinger, Hardy J. pottinge...@missouri.edu wrote: Hi, you'll notice from the language you use to describe your use case, that you use the word convert to describe what you're doing to the original TIFF images. Once you're done producing a derivative from those TIFFs, the only way back to the original TIFFs is to go back to the actual originals. The TIFF images are not stored in the PDF. Only way to go back to the originals is to preserve them. -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ Do you love it? Do you hate it? There it is, the way you made it. --Frank Zappa On 4/26/13 3:08 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
On Fri, Apr 26, 2013 at 5:29 PM, Ethan Gruber ewg4x...@gmail.com wrote: What's your use case in this scenario? Do you want to provide access to the PDFs over the web or are you using them as your archival format? You probably don't want to use PDF to achieve both objectives. The problem I have is I have multipage TIFF files and I don't currently have a good way for users to view them. I also need to preserve these files. Ideally my use case would be to use PDF files created from the TIFFs for both preservation and an archival format. But, as I said, that depends on if I can recreate the original tiff. I have the option of creating a custom viewer that can deal with the the display of the tiff files, but I'm looking for other options. So I have a few choices that I thought of implementing (that I haven't ruled out): 1) This is what I asked about. Make a PDF from the TIFF files. If I could embed the tiff into a pdf, and then at some point recreate the tiff if needed for archival purposes, I have my solution. 2) Convert the multipage TIFF files to individual TIFF files. This would work for my endusers, but would be more clunky than a PDF for them. The new TIFF fiels could be my archival copy. 3) Convert the multipage TIFF files to PDF (probably in a smaller, compressed? state), use the PDF for display/access, save the TIFF for archival purposes. 4) Convert the multipage TIFFs to PDF (or PDF/A?), and don’t worry about being able to recreate the original TIFF files. I should add, the content is what is important in these documents and they are mostly type written or hand written text. Still, I'd like to keep them in as high quality of a format as possible. I'm sure there are some other possible solutions as well. I really would like #1, but it may not be possible. If it isn't, I need to decide (with representatives of my user community) which of the others are better. My guess is it would be #3, but I am not positive. Edward Ethan On Apr 26, 2013 5:11 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.edu wrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any ideas? Edward
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Hi, Edward: After reading through the string of messages and the options that you list below, I think that #3 is your best option. It seems to best fall in line with good archiving practices as I understand them (have one copy for public use and another for archival purposes). If you really want to convert the TIFF to PDF and ditch the TIFF file, I would suggest using PDF/A, the archival version of PDF, if you can. Best of luck! Sincerely, Jason __ Jason Curtis Technical Services Librarian Legal Research Center University of San Diego 5998 Alcalá Park San Diego, CA 92110 Ph: (619) 260-4600, ext.2875 Fax: (619) 260-7495 cur...@sandiego.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. Corrado Sent: Friday, April 26, 2013 2:55 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? On Fri, Apr 26, 2013 at 5:29 PM, Ethan Gruber ewg4x...@gmail.com wrote: What's your use case in this scenario? Do you want to provide access to the PDFs over the web or are you using them as your archival format? You probably don't want to use PDF to achieve both objectives. The problem I have is I have multipage TIFF files and I don't currently have a good way for users to view them. I also need to preserve these files. Ideally my use case would be to use PDF files created from the TIFFs for both preservation and an archival format. But, as I said, that depends on if I can recreate the original tiff. I have the option of creating a custom viewer that can deal with the the display of the tiff files, but I'm looking for other options. So I have a few choices that I thought of implementing (that I haven't ruled out): 1) This is what I asked about. Make a PDF from the TIFF files. If I could embed the tiff into a pdf, and then at some point recreate the tiff if needed for archival purposes, I have my solution. 2) Convert the multipage TIFF files to individual TIFF files. This would work for my endusers, but would be more clunky than a PDF for them. The new TIFF fiels could be my archival copy. 3) Convert the multipage TIFF files to PDF (probably in a smaller, compressed? state), use the PDF for display/access, save the TIFF for archival purposes. 4) Convert the multipage TIFFs to PDF (or PDF/A?), and don't worry about being able to recreate the original TIFF files. I should add, the content is what is important in these documents and they are mostly type written or hand written text. Still, I'd like to keep them in as high quality of a format as possible. I'm sure there are some other possible solutions as well. I really would like #1, but it may not be possible. If it isn't, I need to decide (with representatives of my user community) which of the others are better. My guess is it would be #3, but I am not positive. Edward Ethan On Apr 26, 2013 5:11 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.edu wrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado wrote: Hi All, I have a need to batch convert many TIFF images to PDF. I'd then like to be able to discard the TIFF images, but I can only do that if I can create the original TIFF again from the PDF. Is this possible? If so, using what tools and how? tiff2pdf seems like a possible solution, but I can't find a corresponding pdf2tif program that reverses the process. Any
Re: [CODE4LIB] tiff2pdf, then back to pdf?
Although I do find the persistent myth of PDF/A as an archival format amusing. Under very specific circumstances it can be, but its rare for those circumstances to be deliberatively met. And for many languages it is impossible to use pdf for archival purpuses ever. It is the nature of PDF. On 27/04/2013 8:28 AM, Jason Curtis cur...@sandiego.edu wrote: Hi, Edward: After reading through the string of messages and the options that you list below, I think that #3 is your best option. It seems to best fall in line with good archiving practices as I understand them (have one copy for public use and another for archival purposes). If you really want to convert the TIFF to PDF and ditch the TIFF file, I would suggest using PDF/A, the archival version of PDF, if you can. Best of luck! Sincerely, Jason __ Jason Curtis Technical Services Librarian Legal Research Center University of San Diego 5998 Alcalá Park San Diego, CA 92110 Ph: (619) 260-4600, ext.2875 Fax: (619) 260-7495 cur...@sandiego.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. Corrado Sent: Friday, April 26, 2013 2:55 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf? On Fri, Apr 26, 2013 at 5:29 PM, Ethan Gruber ewg4x...@gmail.com wrote: What's your use case in this scenario? Do you want to provide access to the PDFs over the web or are you using them as your archival format? You probably don't want to use PDF to achieve both objectives. The problem I have is I have multipage TIFF files and I don't currently have a good way for users to view them. I also need to preserve these files. Ideally my use case would be to use PDF files created from the TIFFs for both preservation and an archival format. But, as I said, that depends on if I can recreate the original tiff. I have the option of creating a custom viewer that can deal with the the display of the tiff files, but I'm looking for other options. So I have a few choices that I thought of implementing (that I haven't ruled out): 1) This is what I asked about. Make a PDF from the TIFF files. If I could embed the tiff into a pdf, and then at some point recreate the tiff if needed for archival purposes, I have my solution. 2) Convert the multipage TIFF files to individual TIFF files. This would work for my endusers, but would be more clunky than a PDF for them. The new TIFF fiels could be my archival copy. 3) Convert the multipage TIFF files to PDF (probably in a smaller, compressed? state), use the PDF for display/access, save the TIFF for archival purposes. 4) Convert the multipage TIFFs to PDF (or PDF/A?), and don't worry about being able to recreate the original TIFF files. I should add, the content is what is important in these documents and they are mostly type written or hand written text. Still, I'd like to keep them in as high quality of a format as possible. I'm sure there are some other possible solutions as well. I really would like #1, but it may not be possible. If it isn't, I need to decide (with representatives of my user community) which of the others are better. My guess is it would be #3, but I am not positive. Edward Ethan On Apr 26, 2013 5:11 PM, Edward M. Corrado ecorr...@ecorrado.us wrote: This works sometimes. Well, it does give me a new tiff file from the pdf all of the time, but it is not always anywhere near the same size as the original tiff. My guess is that maybe there is a flag or somethign that woulf help. Here is what I get with one fil: ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.tif A001a.pdf ecorrado@ecorrado:~/Desktop/test$ convert -compress none A001a.pdf A001b.tif ecorrado@ecorrado:~/Desktop/test$ ls -al total 361056 drwxrwxr-x 2 ecorrado ecorrado 4096 Apr 26 17:07 . drwxr-xr-x 7 ecorrado ecorrado20480 Apr 26 16:54 .. -rw-rw-r-- 1 ecorrado ecorrado 38497046 Apr 26 17:07 A001a.pdf -rw-r--r-- 1 ecorrado ecorrado 38178650 Apr 26 17:07 A001a.tif -rw-rw-r-- 1 ecorrado ecorrado 5871196 Apr 26 17:07 A001b.tif In this case, the two tif files should be the same size. They are not even close. Maybe there is a flag to convert (besides compress) that I can use. FWIW: I tried three files/ 2 are like this. The other one, the resulting tiff is the same size as the original. Edward On Fri, Apr 26, 2013 at 4:25 PM, Aaron Addison addi...@library.umass.edu wrote: Imagemagick's convert will do it both ways. convert a.tiff b.pdf convert b.pdf a.tiff If the pdf is more than one page, the tiff will be a multipage tiff. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Fri, 2013-04-26 at 16:08 -0400, Edward M. Corrado