Thanks to each of you who replied (some off-board), especially Chris Stone -- who contributed an AppleScript with some magical incantations that got me 80 percent of the way there. With Chris' help and several hours of weeding out exceptions to the rules, the text is now in the manageable category, even though there is still a lot of work remaining.
On Tuesday, December 23, 2014 7:38:28 AM UTC-8, Greg Raven wrote: > > A friend has sent me ten large text files, in the hopes that I can munge > them into shape for import into a database. Normally, I'd use a combination > of BBEdit and cubic hours hand-editing files such as these, but the sheer > volume of text and the overwhelming variations in the original documents > has gotten me thinking I may be overlooking something. > > The files are listings that look something like this: > > N.Y. Times > Feb., 1933 Page > > 5 NATIONALISM RISES UNDER HITLER RULE E-3 > > ANTI-HITLER CARTOON E-5 > > HOOVER LOOKS BACK - AND AHEAD - ANNE O'HARE MC CORMICK Mag. > 1 > > HITLER AT THE TOP OF HIS DIZZY PATH - EMIL LENGYEL Mag. > 3 > > OUR FLEET PLAYS A FAR-FLUNG WAR GAME - HANSON BALDWIN Mag. > 7 > > FARM MORTGAGES: A PRESSING NATIONAL ISSUE XX-l > > 6 NAZI TROOP MARCH WITH EMPIRE FLAGS AS VIOLENCE MOUNTS - > HITLER HEADS PROCESSION 1 > > ITALY IS EXPECTING NEW TIE WITH REICH - ARNALDO CORTESI, > ROME 4 > > ***** DENIES W. C. BULLITT TALKS FOR COL. (EDWARD MANDELL) HOUSE > (BULLITT HAS BEEN IN FRANCE & VIENNA RECENTLY) 6 > > (Here's a link to a partially-edited version of one of the files: > https://dl.dropboxusercontent.com/u/10003869/test.txt) > > There is a date field that may or may not have a date entry, a headline > field, and a page number field. Some of the date fields have asterisks > where the original compiler wanted to call attention to the entry, and some > of the page numbers have section prefixes. > > I'm wide open to any brainstorms. > -- This is the BBEdit Talk public discussion group. If you have a feature request or would like to report a problem, please email "[email protected]" rather than posting to the group. Follow @bbedit on Twitter: <http://www.twitter.com/bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].
