On 2010-01-12 at 11:31 AM, chriscorb...@gmail.com (Chris) wrote:
Just curious, are people also using the "shell worksheet" in BBEdit?
Short answer:
Yes.
Long answer:
Oh, yes.
Here's one of my uses:
I prepare voter data for election campaigns. I run large sets of
data through a series of processes to make a useful "voter file"
for campaign planning and voter contact.
The problem: how to manage a complex and time-consuming data
processing task, as follows:
Each county has its own format for voter registration data, and
a county's data structures and database field names may change
from one election to the next. Voter data churns constantly --
people move, die, and change name, party, gender, and so on.
Jurisdiction boundaries, precinct lines, and even zip code areas
change, too.
In this environment, each time I set up a voter file, I have to
start from the beginning, building from raw data about voters,
streets, districts, past elections, and other information, any
of which might have changed in content or structure since the
last time I processed it.
A typical county's voter roll requires 17 processes, which
cumulatively clean, standardize, and cross-tabulate the data
into final form. Each process ends with tests, whose results
must be checked ("bio-optically" ;-) before the next process may
begin. Some times it's necessary to back up one or more steps in
the processing when a problem is found.
Running the complete processing series with no interruptions
takes five to eight hours (I can do other work during most of
that time).
Is that enough of a problem statement? Add to it the obvious
need to keep written track of things both during the processing
and between processing occasions.
My solution:
For each election, each county gets a data processing directory
into which I copy a set of BBEdit shell worksheets, one for each
of the 17 processes, plus a few others.
Each worksheet is named for its process; the content of the
worksheet is one or more lines of input arguments, followed by
a call to the script the does the processing. When the script is
executed, its output prints out on the worksheet.
For my processes, the output includes progress indicators as
files are read or written, counts of things found, samplings of
in-process data, and finally the test results from the process
and the paths to the data file(s) that the process yielded.
Here's an example of one of these worksheets, down to and
including the line with #-#-#:
A='Project=OCT2009'
B='base_dir=/Volumes/Campaigns/2009/CO_01'
C='source_file=voter_tabs.txt'
D='criteria=all' # 'criteria=age<50'
E='crosstab=gender pty_group age_cohort zip'
perl /Volumes/LIB/make_cross_tab_summaries "$A" "$B" "$C" "$D" "$E"
#-#-#
The above worksheet sample uses a format that works with the
standard bash shell under OS X Snow Leopard. My scripts parse
standard input as name=value pairs.
Select all lines from A= down to and including the line with
#-#-#. When you press Enter, the output will print below the
#-#-# line. I have an Applescript that clears the sheet below
the #-#-# line and then re-selects the top lines and #-#-# line,
ready for me to press Enter again to re-run the process.
With its own dedicated shell worksheet, each process and its
input parameters, progress reports, and outcomes may be
reviewed, re-run, checked, and annotated for future reference.
Multiple worksheets may be opened and their processes executed
simultaneously (assuming non-dependence).
There is only one copy, in a central library, of the actual
script for each processing step; it may be pointed to by
multiple shell worksheets each with its own parameters.
During script development, I start using the shell worksheet to
call the script from the very beginning. Reflecting this, the
first line output from the scripts I'm describing here simply
shows that the script initialized and loaded its needed modules:
Tue Jan 12 19:21:29 2010 Initializing... Process 5396 using
BVA::XDATA 3.90, BVA::XUI 2.9, BVA::XACT 1.11,
Spreadsheet::WriteExcel 2.25
If a script has a problem, warnings and error messages spill out
down the worksheet (yes, you can cancel a worksheet process),
becoming breadcrumbs for the "warnings are friends" path back to
functioning code.
Perl, the language I most enjoy working in, provides a strong
set of debugging, profiling, and testing tools. I can invoke
these with a few lines kept on the worksheet but normally
commented out. Again, the results from the profiler or test
suite print out on the worksheet for study.
There's more to how I do all this, but I think you can see that
this satisfies the requirements of my problem statement quite well.
Most of what I describe could be done on the command line,
especially by someone adept at using all the tools of that
environment to pipe output to files, capture warnings, tweak
input variables, etc.
But I've come to enjoy how handy it is to have an executable
invocation, together with its most recent input parameters,
outputs, and system messages, plus comments and alternative
inputs, all encapsulated in a single file, which may be one of
several such files that together allow me to direct an entire
library of ruthless code at unsuspecting data.
All happening, of course, in an application highly likely to
already be running on any machine I'm working on, BBEdit.
Whoa, how did I just write so much? Consider that a sign of how
much I appreciate shell worksheets for enabling me to handle one
of my crucial workflows so well.
Best,
- Bruce
_bruce__van_allen__santa_cruz_ca_
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem,
please email "supp...@barebones.com" rather than posting to the group.