On 3/22/24 22:53, Albretch Mueller wrote:
out of a HAR file containing lots of obfuscating js cr@p and all kinds of
nonsense I was able to extract line looking like:

var00='{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die
Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\",
\"Mr. ABC123\"],\"collection\":[\"europeanlibraries\",
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}'
echo "// __ \$var00: |$var00|"

The final result that I need would look like:
o
var02='bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]'
echo "// __ \$var02: |$var02|"

I have tried substring substitution, sed et tr to no avail.

lbrtchx


My daily driver:

2024-03-23 04:02:27 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat /etc/debian_version; uname -a; perl -v | head -n 2 | grep .
11.9
Linux laalaa 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) x86_64 GNU/Linux This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi


Put the JSON into a data file, one record per line (my mailer is line-wrapping data.json -- it contains two lines):

2024-03-23 04:22:20 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat data.json
{"index":"prod-h-006","fields":{"identifier":"bub_gb_O2EAAAAAMAAJ","title":"Die Wissenschaft vom subjectiven Geist","creator":["Karl Rosenkranz", "Mr. ABC123"],"collection":["europeanlibraries", "americana"],"year":1843,"language":["German"],"item_size":797368506},"_score":[50.629513]} {"index":"prod-h-007","fields":{"identifier":"abc_de_12FGHIJKLMNO","title":"My Title","creator":["Some Body", "Somebody Else"],"collection":["europeanlibraries", "americana"],"year":2024,"language":["English"],"item_size":1234567890},"_score":[12.345678]}


A Perl script to read newline-delimited JSON records and pretty print each:

2024-03-23 04:28:59 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat munge-json
#!/usr/bin/perl
# $Id: munge-json,v 1.3 2024/03/23 11:28:58 dpchrist Exp $
# Refer to debian-user 3/22/24 22:53 Albretch Mueller
# "trying to parse lines from an awkwardly formatted HAR file"
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
use strict;
use warnings;
use Data::Dumper;
use JSON;
use Getopt::Long;
$Data::Dumper::Sortkeys = 1;
my $debug;
GetOptions('debug|d' => \$debug) or die;
while (<>) {
    my $rh = decode_json $_;
    print Data::Dumper->Dump([$rh], [qw(rh)]) if $debug;
    print
        join('|',
            $rh->{fields}{identifier},
            $rh->{fields}{title},
            '["' .  join('", "', @{$rh->{fields}{creator}}) . '"]',
            '["' .  join('", "', @{$rh->{fields}{collection}}) . '"]',
            $rh->{fields}{year},
            '["' .  join('", "', @{$rh->{fields}{language}}) . '"]',
            $rh->{fields}{item_size},
            '[' .  join(', ', @{$rh->{_score}}) . ']',
        ), "\n";
}       


Run the script as a Unix filter:

2024-03-23 04:30:16 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ ./munge-json data.json
bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl Rosenkranz", "Mr. ABC123"]|["europeanlibraries", "americana"]|1843|["German"]|797368506|[50.629513] abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody Else"]|["europeanlibraries", "americana"]|2024|["English"]|1234567890|[12.345678]

2024-03-23 04:30:18 dpchrist@laalaa ~/sandbox/perl/debian-users/20240322-2253-albretch-mueller
$ cat data.json | ./munge-json
bub_gb_O2EAAAAAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl Rosenkranz", "Mr. ABC123"]|["europeanlibraries", "americana"]|1843|["German"]|797368506|[50.629513] abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody Else"]|["europeanlibraries", "americana"]|2024|["English"]|1234567890|[12.345678]


David

Reply via email to