Hi Zoe,
Very interesting topic. The more you think of it the more questions it
brings:
• what about capitalization, should "Dog" or "DOG" count as "dog"?
• what about plurals, should "dogs" count as "dog"?
• what about expressions containing sub-expressions, should the "dog" in
"dog owner", "hot dog", etc. count as dog?
Here is a bit of AppleScript trying (naively) to solve those questions.
1. It asks for a list.txt file.
2. It asks for a content.txt file.
3. It massages both files to produce the required statistics.
4. It presents the statistics in a new BBEdit window.
5. It copies the regular expression used to find the terms in BBEdit's
Find Window
so you can see what was found if you search the Content.txt file.
You can check the "Show matches" option of the Find Window if you want
to have the terms highlighted.
Copy it to the Script Editor or save it in BBEdit's Scripts folder if you
want to be able to use if from BBEdit's script menu.
I have left out this last question:
• what about synonyms, should "chihuahua" count as "dog"?
It can be solved by using the Text > Canonize... function of BBEdit when
preparing the content.txt file.
HTH
Jean Jourdain
--
```applescript
*set* vListFile *to* *choose file* with prompt "Choose a Terms File (with
list of terms to count):" of type {"Txt", "Text"}
*set* vListPath *to* POSIX path *of* (vListFile *as* *string*)
*set* vCommand *to* "cat" & space
*set* vCommand *to* vCommand & (*the* quoted form *of* vListPath) & space
*set* vCommand *to* vCommand & "| sed '/^$/d'" & space -- Remove blank
lines.
*set* vCommand *to* vCommand & "| sort --reverse" -- Sort in reverse order
for longer expression to be searched before shorter i.e: "dog owner" before
"dog".
*set* vList *to* *do shell script* vCommand
*set* vTerms *to* (*paragraphs* *of* vList) *as* *list*
*set* {vPreviousDelimiters, AppleScript's text item delimiters} *to*
{AppleScript's text item delimiters, "|"}
*set* vRegex *to* "(" & (vTerms *as* *string*) & ")s?" -- Naive use of s?
to include plural forms.
*log* vRegex
*set* AppleScript's text item delimiters *to* vPreviousDelimiters
*set* vContentsFile *to* *choose file* with prompt "Choose a Contents File
(with contents to analyze):" of type {"Txt", "Text"}
*set* vContentsPath *to* POSIX path *of* (vContentsFile *as* *string*)
*set* vCommand *to* "/usr/local/bin/bbfind" & space
*set* vCommand *to* vCommand & (*the* quoted form *of* vRegex) & space
*set* vCommand *to* vCommand & "--grep --match-words --extract" & space
*set* vCommand *to* vCommand & (*the* quoted form *of* vContentsPath) &
space
*set* vCommand *to* vCommand & "| tr [[:upper:]] [[:lower:]]" & space --
Convert result to lowercase so we count together "dog", "Dog", "DOG", etc.
*set* vCommand *to* vCommand & "| sed 's/s$//'" & space -- Remove tailing
's' so we count together "dog" and "dogs".
*set* vCommand *to* vCommand & "| sort" & space -- Sort ascending.
*set* vCommand *to* vCommand & "| uniq -c" & space -- Count unique values.
*set* vCommand *to* vCommand & "| sort --numeric-sort --reverse" -- Sort
results numerically descending.
*set* vResult *to* *do shell script* vCommand
*tell* *application* "BBEdit"
*make* new *text document* with properties {contents:vResult}
*activate* find window
*set* vWindow *to* find window
*delay* 0.2
*set* *text* 1 *of* vWindow *to* vRegex
*end* *tell*
*```*
On Wednesday, June 23, 2021 at 6:27:42 PM UTC+2 Harvey Pikelberger wrote:
> Great, LMK
>
> On Jun 23, 2021, at 00:08, Zoe Barnett <[email protected]> wrote:
>
> Hello Harvey... Wow. This is very kind of you. Thank you!
>
>
> I hope to take some time this weekend to see if I can make this work. I'll
> let you know how it goes.
>
> Zoe
>
> On Wednesday, 23 June 2021 at 05:22:31 UTC+3 Harvey Pikelberger wrote:
>
>>
>> On Jun 22, 2021, at 12:59 PM, Zoe Barnett <[email protected]> wrote:
>>
>> As neither bear nor mouse. I need to count distinct words or exact
>> phrases separated by spaces on either side or followed by punctuation like
>> , : ; . So
>> popular pet,
>> (followed by a comma) would get a hit, but
>> popular petshops
>> wouldn't.
>>
>>
>> There are a lot of ways to approach this.
>> The BBEdit folks are brilliant w/ AppleScript and integrating that with
>> BBEdit.
>> There are also ways to approach this using command line which is very
>> fast, a little cryptic.
>> Here's a very simple, probably overly simple NodeJS approach, which can
>> be modified to work in a web browser.
>> You would no doubt customize this to suit the nuances and particulars of
>> your workflow.
>>
>> fs = require('fs'); //file handling library for NodeJS
>> wordList = fs.readFileSync(__dirname + '/list.txt').toString().trim().
>> split('\n'); //get the values in "list.txt"
>> content = ' ' + fs.readFileSync(__dirname + '/content.txt').toString().
>> trim() + ' '; //gets the content
>> output = []; //initialize output variable
>> wordList.forEach(thisSearch => { //iterate through your list of words
>> var thisRegex, thisRslt, thisCount;
>> thisRegex = new RegExp('[^a-zA-Z]' + thisSearch + '[^a-zA-Z]', 'g'); //Set
>> up the search / count
>> thisRslt = content.match(thisRegex); // execute the count
>> thisCount = (thisRslt == null ? 0 : thisRslt.length); // turn "null"
>> results into zero
>> console.log(thisSearch, thisCount, thisRslt); //progress outputted to
>> console
>> output.push(thisSearch + "\t" + thisCount); //add to the output variable
>> });
>> fs.writeFileSync(__dirname + '/output.txt', output.join('\n')); //puts
>> the final result in a file named "output.txt"
>>
>>
>>
>>
>> --
> This is the BBEdit Talk public discussion group. If you have a feature
> request or need technical support, please email "[email protected]"
> rather than posting here. Follow @bbedit on Twitter: <
> https://twitter.com/bbedit>
> ---
> You received this message because you are subscribed to the Google Groups
> "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/bbedit/cd0e36f3-6286-470d-a6dd-f06d3cd2aca3n%40googlegroups.com
>
> <https://groups.google.com/d/msgid/bbedit/cd0e36f3-6286-470d-a6dd-f06d3cd2aca3n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
--
This is the BBEdit Talk public discussion group. If you have a feature request
or need technical support, please email "[email protected]" rather than
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/bbedit/d059633d-95d6-426e-b516-0293590c81e9n%40googlegroups.com.