A friend has sent me ten large text files, in the hopes that I can munge 
them into shape for import into a database. Normally, I'd use a combination 
of BBEdit and cubic hours hand-editing files such as these, but the sheer 
volume of text and the overwhelming variations in the original documents 
has gotten me thinking I may be overlooking something.

The files are listings that look something like this:

N.Y. Times
Feb., 1933                                                            Page

5         NATIONALISM RISES UNDER HITLER RULE                         E-3

          ANTI-HITLER CARTOON                                         E-5

          HOOVER LOOKS BACK - AND AHEAD - ANNE O'HARE MC CORMICK      Mag. 1

          HITLER AT THE TOP OF HIS DIZZY PATH - EMIL LENGYEL          Mag. 3

          OUR FLEET PLAYS A FAR-FLUNG WAR GAME - HANSON BALDWIN       Mag. 7

          FARM MORTGAGES: A PRESSING NATIONAL ISSUE                   XX-l

6         NAZI TROOP MARCH WITH EMPIRE FLAGS AS VIOLENCE MOUNTS -
          HITLER HEADS PROCESSION                                     1

          ITALY IS EXPECTING NEW TIE WITH REICH - ARNALDO CORTESI,
          ROME                                                        4

*****     DENIES W. C. BULLITT TALKS FOR COL. (EDWARD MANDELL) HOUSE
          (BULLITT HAS BEEN IN FRANCE & VIENNA RECENTLY)              6

(Here's a link to a partially-edited version of one of the 
files: https://dl.dropboxusercontent.com/u/10003869/test.txt)

There is a date field that may or may not have a date entry, a headline 
field, and a page number field. Some of the date fields have asterisks 
where the original compiler wanted to call attention to the entry, and some 
of the page numbers have section prefixes.

I'm wide open to any brainstorms.

-- 
This is the BBEdit Talk public discussion group. If you have a 
feature request or would like to report a problem, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Reply via email to