AppleScript is wonderful as a glue language. As a data processing language, not
so much.
BBEdit is happy running any sort of scripts, not just AppleScript.This looks
like it should be trivial in Perl.
Regards,
Neil Faiman
> On Jul 29, 2020, at 9:54 AM, Lewis Downey <[email protected]> wrote:
>
> Hello!
>
> I do not know where to start to put together a grep pattern that will parse a
> 375,000+ line csv file in a specific way for a personal project. The file is
> publicly available from the NY Times and contains data related to tracking
> Covid-19
> <https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv> cases
> and deaths. Each line consists of six fields:
>
> * a date
> * a county name
> * a state name
> * FIPS number for the county
> * cumulative number of Covid-19 cases reported in that county as of that date
> * cumulative number of Covid-19 deaths reported in that county as of that date
>
> I am trying to extract the date of the first reported Covid-19 death for each
> county in the US. This would be the first time that the last field is greater
> than 0 for unique county+state combinations. About 1/3 of all counties still
> have no reported deaths. Those counties should not be found be found by the
> pattern. In round numbers there are about 3100 counties in the report.
> Roughly 2000 have reported at least one Covid-19 death. I am trying to find
> those specific 2000ish counties and including the date of the first reported
> death.
>
> The pattern needs to find the first instance of unique combinations of county
> & state where the last comma-delimited field is greater than zero. The
> pattern will be used by BBedit in an Applescript. The Applescript execution
> needs to be as brief as possible so as not to commandeer my laptop for a
> meaningful chunk of time. The script will be run at least once each week. By
> iterating through a list of counties I made Applescript, BBedit, & Numbers do
> this but as-built the execution took too long, is prone to running out of
> memory, and otherwise makes my laptop close to unusable while it runs. An
> efficient grep pattern would be incredibly helpful.
>
> Here is a handcrafted sample of the file:
> date,county,state,fips,cases,deaths
> 2020-01-21,Snohomish,Washington,53061,1,0
> 2020-01-22,Snohomish,Washington,53061,1,0
> 2020-01-23,Snohomish,Washington,53061,1,0
> 2020-01-24,Cook,Illinois,17031,1,0
> 2020-01-24,Snohomish,Washington,53061,1,0
> 2020-01-25,Orange,California,06059,1,0
> 2020-01-25,Cook,Illinois,17031,1,0
> 2020-01-25,Snohomish,Washington,53061,1,0
> 2020-01-26,Maricopa,Arizona,04013,1,0
> 2020-01-26,Los Angeles,California,06037,1,0
> 2020-01-26,Orange,California,06059,1,0
> 2020-01-26,Cook,Illinois,17031,1,0
> 2020-03-03,Wake,North Carolina,37183,1,0
> 2020-03-04,Wake,North Carolina,37183,1,0
> 2020-03-09,Union,New Jersey,34039,1,0
> 2020-03-21,Essex,Massachusetts,25009,41,0
> 2020-03-22,Essex,Massachusetts,25009,60,0
> 2020-03-23,Essex,Massachusetts,25009,73,1
> 2020-03-24,Essex,Massachusetts,25009,118,1
> 2020-03-24,Union,New Jersey,34039,246,2
> 2020-03-25,Essex,Massachusetts,25009,177,1
> 2020-04-09,Union,New Jersey,34039,5203,145
> 2020-04-12,Essex,Massachusetts,25009,3170,102
> 2020-04-13,Essex,Massachusetts,25009,3413,114
> 2020-04-15,Wake,North Carolina,37183,510,1
> 2020-05-06,Union,New Jersey,34039,13604,800
> 2020-05-07,Union,New Jersey,34039,13781,829
> 2020-05-08,Union,New Jersey,34039,13917,844
> 2020-06-30,Wake,North Carolina,37183,5178,47
> 2020-07-01,Wake,North Carolina,37183,5379,48
> 2020-07-02,Wake,North Carolina,37183,5590,49
>
> From that sample the grep pattern needs to find exactly these three lines:
> 2020-03-23,Essex,Massachusetts,25009,73,1 <- 1st occurrence of Essex,
> Massachusetts w/ last field great than zero 0
> 2020-03-24,Union,New Jersey,34039,246,2 <- 1st occurrence of Union,
> New Jersey w/ last field great than zero 0
> 2020-04-15,Wake,North Carolina,37183,510,1 <- 1st occurrence of Wake,
> North Carolina w/ last field great than zero 0
>
> And report them as
> 2020-03-23,Essex,Massachusetts
> 2020-03-24,Union,New Jersey
> 2020-04-15,Wake,North Carolina
>
> There are other counties in the sample but none of the others have a 6th
> comma-delimited field with a value greater than 0. Those counties should not
> be found.
> That said, an alternative report is perfectly workable if it is easier to
> generate (as shown below but without the descriptive comments). It is hard
> for me to see how this alternative report
> would be easier to generate, but grep has plenty of other qualities i would
> not imagine.
>
> 2020-01-25,Snohomish,Washington,0 <- most recent date reported for
> county where last comma-delimited field = 0
> 2020-01-26,Cook,Illinois,0 <- most recent date reported
> for county where last comma-delimited field = 0
> 2020-01-26,Los Angeles,California,0 <- most recent date reported for
> county where last comma-delimited field = 0
> 2020-01-26,Maricopa,Arizona,0 <- most recent date reported for
> county where last comma-delimited field = 0
> 2020-01-26,Orange,California,0 <- most recent date reported for
> county where last comma-delimited field = 0
> 2020-03-23,Essex,Massachusetts,1 <- number of deaths first reported now
> included in report (almost always 1)
> 2020-03-24,Union,New Jersey,2 <- number of deaths first reported now
> included in report (almost always 1, 2 shown here to cover the case of more
> than 1)
> 2020-04-15,Wake,North Carolina,1 <- number of deaths first reported now
> included in report (almost always 1)
>
> I can build grep patterns for really easy stuff but do not know how to
> approach this. Is it even possible?
>
> Any help you provide will be appreciated. Thank you for taking the time to
> think about this question.
>
> Lewis
>
>
> --
> This is the BBEdit Talk public discussion group. If you have a feature
> request or need technical support, please email "[email protected]"
> rather than posting here. Follow @bbedit on Twitter:
> <https://twitter.com/bbedit <https://twitter.com/bbedit>>
> ---
> You received this message because you are subscribed to the Google Groups
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/bbedit/455abe22-f5d7-411c-9615-bdc7968f3256o%40googlegroups.com
>
> <https://groups.google.com/d/msgid/bbedit/455abe22-f5d7-411c-9615-bdc7968f3256o%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
This is the BBEdit Talk public discussion group. If you have a feature request
or need technical support, please email "[email protected]" rather than
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bbedit/5815CCEC-E88B-4DD5-B749-8B84C4A0C78B%40faiman.org.