Hi Michael,

I'll try to send you an Automator services workflow as an attachment off list, 
because I don't seem to be able to access the account Gordon and Lynne set up 
for me to upload items like AppleScripts for distribution ever since they moved 
servers.  (I can't get into the publish anonymous ftp sections, either.)  
You'll need to download and install Carsten Blüm's pdftotext package from his 
installers page:
http://www.bluem.net/en/mac/packages/
If you're running Mountain Lion, be sure to use the context menu to open and 
install this application, in order to get past the Gatekeeper software checks, 
since this application predates the requirements of "signed" applications.  
You'll have to deliberately select "open" in the context menu dialog windows 
after receiving a warning that the app is from an unidentified developer.

This installs a program named "pdftotext", that is executed from the command 
line of Terminal.  What I've done is used Automator to put a wrapper around 
this, and create a service named "batch_pdf_to_text", so that you'll be able to 
select PDF files in Finder, and not have to use Terminal.  This may not solve 
your problems, but what the "pdftotext" program lets me do is set a switch 
option for "stream order" so that text extracted from table data will be read 
item by item across each row, instead of all the items from the first column, 
followed by all the items in the second column, etc., which is how Preview 
appears to read them by default.

So I basically opened Automator, selected "Service" as the type of document, 
then put the Terminal command for using the command into a "Run Shell Script" 
action that looks like this:

# Use pdftotext to create a text version of the selected PDF file in stream 
order
#    Created 26 January 2013
for f in "$@"
do
/usr/local/bin/pdftotext -layout -raw "$f"
done

The two lines with hash marks are comment lines.   The command to run 
"pdftotext" uses the "switches" for stream order and original layout format of 
"-raw" and "-layout", and is applied to a quoted variable argument, that 
increments in a do loop that passes in the selected file names.  I started the 
service definition by setting the pop up buttons for "Service receives 
selected" to "files or folders" in "Finder". And I set the pop up button in the 
heading of the "Run Shell Script" action to "Pass input" to "as arguments" -- 
meaning that the script will read in the file names from your Finder selection. 
 It will create an output text file with the same name as your selected PDF 
files but with a ".txt" extension in the same directory.

I don't think it would be useful to tell you how to create the automator 
service step by step, because in this case it would require explaining how 
shell scripting and commands that most people don't use in Terminal work.  In 
most cases, running automator workflows involves selecting familiar actions 
that you use within applications, but you're basically asking for a complete 
rewrite of the way that text is normally extracted from PDF.

You basically have to do two things:
1) download and install pdftotext from:
http://www.bluem.net/en/mac/packages/
 2) save the attached Automator service file that I'll send you off list to the 
Library/Services folder under your account. 

I think this happens automatically if you just try to open the attached file, 
which will open in Automator, and then use Command-S to save it. I had to build 
this on a machine running Lion.  On this machine, if you have an Automator 
service under the Library/Services folder of your account, it will show up in 
the context menu of files you select in Finder when you use VO-Shift-M, 
navigate to the "Services" menu option, then right arrow to the submenu to find 
the name of the service you want to apply (e.g., "batch_pdf_to_text").

Try this service on a single file that you had problems reading with Preview. 
Then look for a file of the same name, but with ".txt" extension in the same 
folder as the file you selected, and open it.  Check whether this resolves the 
problems you had reading that PDF file in Preview.

HTH. Cheers,

Esther



On Jun 6, 2013, at 12:01 PM, Michael Marshall wrote:

> hello listers,
> today i was on the net and red about how i can extract text from pdf docs and 
> save them in RTF using Automator.
> needless to say i do not know how to use Automator
> could anyone who has used this fantastic thing with VO plese give me step by 
> step instructions on how to go about this?
> 
> any help would be fantastic.
> 
> Michael

<--- Mac Access At Mac Access Dot Net --->

To reply to this post, please address your message to [email protected]

You can find an archive of all messages posted    to the Mac-Access forum at 
either the list's own dedicated web archive:
<http://mail.tft-bbs.co.uk/pipermail/mac-access/index.html>
or at the public Mail Archive:
<http://www.mail-archive.com/[email protected]/>.
Subscribe to the list's RSS feed from:
<http://www.mail-archive.com/[email protected]/maillist.xml>

As the Mac Access Dot Net administrators, we do our very best to ensure that 
the Mac-Access E-Mal list remains malware, spyware, Trojan, virus and 
worm-free.  However, this should in no way replace your own security strategy.  
We assume neither liability nor responsibility should something unpredictable 
happen.

Please remember to update your membership preferences periodically by visiting 
the list website at:
<http://mail.tft-bbs.co.uk/mailman/listinfo/mac-access/options/>

Reply via email to