On 2010-01-12 at 11:31 AM, chriscorb...@gmail.com (Chris) wrote:

Just curious, are people also using the "shell worksheet" in BBEdit?

Short answer:

Yes.

Long answer:

Oh, yes.

Here's one of my uses:

I prepare voter data for election campaigns. I run large sets of data through a series of processes to make a useful "voter file" for campaign planning and voter contact.

The problem: how to manage a complex and time-consuming data processing task, as follows:

Each county has its own format for voter registration data, and a county's data structures and database field names may change from one election to the next. Voter data churns constantly -- people move, die, and change name, party, gender, and so on. Jurisdiction boundaries, precinct lines, and even zip code areas change, too.

In this environment, each time I set up a voter file, I have to start from the beginning, building from raw data about voters, streets, districts, past elections, and other information, any of which might have changed in content or structure since the last time I processed it.

A typical county's voter roll requires 17 processes, which cumulatively clean, standardize, and cross-tabulate the data into final form. Each process ends with tests, whose results must be checked ("bio-optically" ;-) before the next process may begin. Some times it's necessary to back up one or more steps in the processing when a problem is found.

Running the complete processing series with no interruptions takes five to eight hours (I can do other work during most of that time).

Is that enough of a problem statement? Add to it the obvious need to keep written track of things both during the processing and between processing occasions.

My solution:

For each election, each county gets a data processing directory into which I copy a set of BBEdit shell worksheets, one for each of the 17 processes, plus a few others.

Each worksheet is named for its process; the content of the worksheet is one or more lines of input arguments, followed by a call to the script the does the processing. When the script is executed, its output prints out on the worksheet.

For my processes, the output includes progress indicators as files are read or written, counts of things found, samplings of in-process data, and finally the test results from the process and the paths to the data file(s) that the process yielded.

Here's an example of one of these worksheets, down to and including the line with #-#-#:

A='Project=OCT2009'
B='base_dir=/Volumes/Campaigns/2009/CO_01'
C='source_file=voter_tabs.txt'
D='criteria=all' # 'criteria=age<50'
E='crosstab=gender pty_group age_cohort zip'

perl /Volumes/LIB/make_cross_tab_summaries "$A" "$B" "$C" "$D" "$E"
#-#-#

The above worksheet sample uses a format that works with the standard bash shell under OS X Snow Leopard. My scripts parse standard input as name=value pairs.

Select all lines from A= down to and including the line with #-#-#. When you press Enter, the output will print below the #-#-# line. I have an Applescript that clears the sheet below the #-#-# line and then re-selects the top lines and #-#-# line, ready for me to press Enter again to re-run the process.

With its own dedicated shell worksheet, each process and its input parameters, progress reports, and outcomes may be reviewed, re-run, checked, and annotated for future reference. Multiple worksheets may be opened and their processes executed simultaneously (assuming non-dependence).

There is only one copy, in a central library, of the actual script for each processing step; it may be pointed to by multiple shell worksheets each with its own parameters.

During script development, I start using the shell worksheet to call the script from the very beginning. Reflecting this, the first line output from the scripts I'm describing here simply shows that the script initialized and loaded its needed modules:

Tue Jan 12 19:21:29 2010 Initializing... Process 5396 using BVA::XDATA 3.90, BVA::XUI 2.9, BVA::XACT 1.11, Spreadsheet::WriteExcel 2.25

If a script has a problem, warnings and error messages spill out down the worksheet (yes, you can cancel a worksheet process), becoming breadcrumbs for the "warnings are friends" path back to functioning code.

Perl, the language I most enjoy working in, provides a strong set of debugging, profiling, and testing tools. I can invoke these with a few lines kept on the worksheet but normally commented out. Again, the results from the profiler or test suite print out on the worksheet for study.

There's more to how I do all this, but I think you can see that this satisfies the requirements of my problem statement quite well.

Most of what I describe could be done on the command line, especially by someone adept at using all the tools of that environment to pipe output to files, capture warnings, tweak input variables, etc.

But I've come to enjoy how handy it is to have an executable invocation, together with its most recent input parameters, outputs, and system messages, plus comments and alternative inputs, all encapsulated in a single file, which may be one of several such files that together allow me to direct an entire library of ruthless code at unsuspecting data.

All happening, of course, in an application highly likely to already be running on any machine I'm working on, BBEdit.

Whoa, how did I just write so much? Consider that a sign of how much I appreciate shell worksheets for enabling me to handle one of my crucial workflows so well.

Best,




   - Bruce

_bruce__van_allen__santa_cruz_ca_

-- 
You received this message because you are subscribed to the 
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email "supp...@barebones.com" rather than posting to the group.

Reply via email to