Hi Lux,

Here are two perl text filters.
Copy them to ~/Library/Application Support/BBEdit/Text Filters.
You can test them from menu Text > Apply Text Filter.

This one (tags_and_categories_to_list_pl.pl) is for your initial case (CSV 
-> hyphenated item per line) :

    #!/usr/bin/env perl
    use v5.14;
    use strict;
    use warnings;

    while(my $line = <>)  {   
        if (my ($field, $items) = $line =~ 
/^(tags|categories):\s*(.+)\s*$/gi) {
            my $yaml_items = $items =~ s/\s*,\s*/\n- /gr; 
            print "${field}:\n- ${yaml_items}\n";
        } else {
            print $line;   
        }
    }

    =for test

    I got a huge number of posts from an old static blog. Inside the header 
there's a "tags" line, made this way:

    tags: Steve Jobs, Steve Throughton-Smith, T'Bone, Better to Be a Pirate 
Than Join the Navy, Ben & Jerry, NBA75

    The actual number of tags in the line can be any. Comma act as a 
delimiter and, after that, any character can be used inside a tag, spaces 
included.

    categories: Education

    =cut

This other one (tags_and_categories_to_array_pl.pl) if for your second case 
(hyphenated item per line -> array):

    #!/usr/bin/env perl
    use v5.14;
    use strict;
    use warnings;

    $/ = undef;

    sub replace {
        my $match = shift;
        my @splitted = split /[[:blank:]]*\n[[:blank:]]*-[[:blank:]]/, 
$match;
        my $field = shift @splitted;
        map { s/^\s+|\s+$//g; } @splitted;
        my $joined = join ", ", @splitted;
        return "${field} [${joined}]\n";
    }

    print <> =~ 
s/^((?:tags|categories)[[:blank:]]*:[[:blank:]]*\n[[:blank:]]*(?:-[^\n]+\n)+)/replace
 
$1/grimse; 

    =for test

    title: "Più uguali degli altri"
    date: 2022-02-24T01:43:23+01:00
    draft: false
    toc: false
    comments: false
    categories:
    - Education
    tags:
    - MacBook Pro
    - iPad mini
    - Apple Pencil
    - Bowdoin

    =cut

HTH,

Jean Jourdain

On Friday, February 25, 2022 at 5:05:02 AM UTC+1 lux wrote:

> On Wednesday, February 23, 2022 at 6:30:31 PM UTC+1 [email protected] 
> wrote:
>
>> When you state header, this implies that there is more data than what you 
>> present. Could you provide a small selection of what the data file actually 
>> looks like? ;)
>>
>
> This is an example of a complete header:
>
> *((( header begins )))*
> ---
> title: "Più uguali degli altri"
> date: 2022-02-24T01:43:23+01:00
> draft: false
> toc: false
> comments: false
> categories:
> - Education
> tags:
> - MacBook Pro
> - iPad mini
> - Apple Pencil
> - Bowdoin
> ---
> *((( header ends )))*
>
> Right under the header, the body text begins.
>
> The count for categories and tags items can be zero, one, or more than one.
>
> My main issue is to refrain from mistakenly capturing sentences inside the 
> body text that share the same structure (i.e., lists beginning with dashes).
>
> Working on the problem, I could somewhat simplify the problem. I'd like 
> now to convert the former header in the following one:
>
> *((( header begins )))*
> ---
> title: "Più uguali degli altri"
> date: 2022-02-24T01:43:23+01:00
> draft: false
> toc: false
> comments: false
> categories: [Education]
> tags: [MacBook Pro, iPad mini, Apple Pencil, Bowdoin]
> ---
> *((( header ends )))*
>
> Again, thanks very much for the attention. :-)
>
> lux
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "[email protected]" rather than 
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/f076b205-ab47-4542-a08a-e7ee79637913n%40googlegroups.com.

Reply via email to