Here's a hint at a start of what you need to do, it should be pretty easy to 
extend this, if it's unclear, let me know:

for starters, run your "gunk" into jq like this:

$ echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq
{
  "index": "prod-h-006",
  "fields": {
    "identifier": "bub_gb_O2EAAAAAMAAJ",
    "title": "Die Wissenschaft vom subjectiven Geist",
    "creator": [
      "Karl Rosenkranz",
      "Mr. ABC123"
    ],
    "collection": [
      "europeanlibraries",
      "americana"
    ],
    "year": 1843,
    "language": [
      "German"
    ],
    "item_size": 797368506
  },
  "_score": [
    50.629513
  ]
}

then, start building your output like this:

echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAAAAAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq '.fields.identifier + "|" + .fields.title'

jq is an amazing tool, it's a full fledged programming language.  You just need 
to continue concatenating your desired output.  You might even find you can do 
what you want all inside a jq script instead of what you're doing.  Consider 
writing a jq script with the first line of the script #!/usr/bin/jq

Hope this gets you on the right path!

Michael Grant

________________________________
From: to...@tuxteam.de
Sent: Friday, March 22, 2024 23:44
To: Albretch Mueller
Cc: debian-user
Subject: Re: trying to parse lines from an awkwardly formatted HAR file ...

On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
> nonsense I was able to extract line looking like:

It's not "js cr@p", It is called JSON. And there's a spec for
it.

[...]

> I have tried substring substitution, sed et tr to no avail.

You might have a lot of fun trying to parse JSON with sed and
tr.

If you are serious about it, you should try a proper parser
and extractor. I'd recommend jq [1], available in Debian under
the same-named package. I have written a few shell scripts
reaching into the innards of

You'll have to wrap your brain around it, but in the time you
have implemented a parser for js in "sed and tr" (you might
need a dash of "proper programming language" around that, some
luck and a ton of elbow grease) you might have wrapped your
brain like 16 times around jq (or some other appropriate tool).

Cheers
--
tomás

Reply via email to