Hi Michael, I'll try to send you an Automator services workflow as an attachment off list, because I don't seem to be able to access the account Gordon and Lynne set up for me to upload items like AppleScripts for distribution ever since they moved servers. (I can't get into the publish anonymous ftp sections, either.) You'll need to download and install Carsten Blüm's pdftotext package from his installers page: http://www.bluem.net/en/mac/packages/ If you're running Mountain Lion, be sure to use the context menu to open and install this application, in order to get past the Gatekeeper software checks, since this application predates the requirements of "signed" applications. You'll have to deliberately select "open" in the context menu dialog windows after receiving a warning that the app is from an unidentified developer.
This installs a program named "pdftotext", that is executed from the command line of Terminal. What I've done is used Automator to put a wrapper around this, and create a service named "batch_pdf_to_text", so that you'll be able to select PDF files in Finder, and not have to use Terminal. This may not solve your problems, but what the "pdftotext" program lets me do is set a switch option for "stream order" so that text extracted from table data will be read item by item across each row, instead of all the items from the first column, followed by all the items in the second column, etc., which is how Preview appears to read them by default. So I basically opened Automator, selected "Service" as the type of document, then put the Terminal command for using the command into a "Run Shell Script" action that looks like this: # Use pdftotext to create a text version of the selected PDF file in stream order # Created 26 January 2013 for f in "$@" do /usr/local/bin/pdftotext -layout -raw "$f" done The two lines with hash marks are comment lines. The command to run "pdftotext" uses the "switches" for stream order and original layout format of "-raw" and "-layout", and is applied to a quoted variable argument, that increments in a do loop that passes in the selected file names. I started the service definition by setting the pop up buttons for "Service receives selected" to "files or folders" in "Finder". And I set the pop up button in the heading of the "Run Shell Script" action to "Pass input" to "as arguments" -- meaning that the script will read in the file names from your Finder selection. It will create an output text file with the same name as your selected PDF files but with a ".txt" extension in the same directory. I don't think it would be useful to tell you how to create the automator service step by step, because in this case it would require explaining how shell scripting and commands that most people don't use in Terminal work. In most cases, running automator workflows involves selecting familiar actions that you use within applications, but you're basically asking for a complete rewrite of the way that text is normally extracted from PDF. You basically have to do two things: 1) download and install pdftotext from: http://www.bluem.net/en/mac/packages/ 2) save the attached Automator service file that I'll send you off list to the Library/Services folder under your account. I think this happens automatically if you just try to open the attached file, which will open in Automator, and then use Command-S to save it. I had to build this on a machine running Lion. On this machine, if you have an Automator service under the Library/Services folder of your account, it will show up in the context menu of files you select in Finder when you use VO-Shift-M, navigate to the "Services" menu option, then right arrow to the submenu to find the name of the service you want to apply (e.g., "batch_pdf_to_text"). Try this service on a single file that you had problems reading with Preview. Then look for a file of the same name, but with ".txt" extension in the same folder as the file you selected, and open it. Check whether this resolves the problems you had reading that PDF file in Preview. HTH. Cheers, Esther On Jun 6, 2013, at 12:01 PM, Michael Marshall wrote: > hello listers, > today i was on the net and red about how i can extract text from pdf docs and > save them in RTF using Automator. > needless to say i do not know how to use Automator > could anyone who has used this fantastic thing with VO plese give me step by > step instructions on how to go about this? > > any help would be fantastic. > > Michael <--- Mac Access At Mac Access Dot Net ---> To reply to this post, please address your message to [email protected] You can find an archive of all messages posted to the Mac-Access forum at either the list's own dedicated web archive: <http://mail.tft-bbs.co.uk/pipermail/mac-access/index.html> or at the public Mail Archive: <http://www.mail-archive.com/[email protected]/>. Subscribe to the list's RSS feed from: <http://www.mail-archive.com/[email protected]/maillist.xml> As the Mac Access Dot Net administrators, we do our very best to ensure that the Mac-Access E-Mal list remains malware, spyware, Trojan, virus and worm-free. However, this should in no way replace your own security strategy. We assume neither liability nor responsibility should something unpredictable happen. Please remember to update your membership preferences periodically by visiting the list website at: <http://mail.tft-bbs.co.uk/mailman/listinfo/mac-access/options/>
