subject:"Re\: trying to parse lines from an awkwardly formatted HAR file ..."

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-26 Thread David Wright

On Sat 23 Mar 2024 at 11:55:04 (-0400), Greg Wooledge wrote:
> On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
> >  a) using a chromium-derived browser, which can be used to dump the
> > HAR file log of the network back and forth, go, e. g.:
> >   https://en.wikipedia.org/wiki/Anaxagoras
> >  b) click on the link that says: "Works by or about Anaxagoras" (at
> > Internet Archive)
> >  c) on the archive.org page, select "texts" and "always available"
> > (meaning text which is public domain, he died 25 centuries ago)
> >  d) then to produce the HAR file, go:
> >  d.1) More Tools > Developer Tools;
> >  d.2) click on "Network" tab;
> >  d.3) Filter: GET
> >  d.4) check: "Preserve Log"
> >  d.5) scroll down the page all the way to make the client-server back
> > and forth cascade
> >  d.6) save the network log as HAR file to then open and eyeball it!
> 
> This is incomprehensible to me.  What the hell is d.5 supposed to be?
> Even if I close the Shift-Ctrl-I window, and Ctrl-R to reload the page,

Some web pages don't load completely unless you scroll down them,
whereupon more of the page is loaded. Even if you press End, you may
not get the whole page loaded. One method of completion is to
repeatedly press End and PageUp until no more content appears
(or you observe some sort of bottom-of-page indication).

You'll recognise this if you shop with Kroger™/Dillons™/Fry's™
( in the US).

Ctrl-R is of no help: it can merely reload as much of the page as has
been visited so far. So there is some method in their madness (for
this one step—I don't know about the rest).

> and then reopen Shift-Ctrl-I, and click the down-arrow-in-a-dish icon
> whose tooltip says "Export HAR..." all I get in the resulting file
> is this:

Cheers,
David.

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller

> Archive.org has a well-documented API at
> https://archive.org/developers/. There's even a command-line tool
> (assuming one doesn't want to use, say, the python library).

I had given a somewhat thorough reading to their API some time ago,
but didn’t find anything that interesting and I was thinking of
developing a java GraalVM API which would be more customizable, easily
usable for other text banks. I took a second look at it and they still
don’t address their own problems, like repeated texts (same exact
text/publication with different identifiers), not standardized
metadata definitions: fr., french, French, fr, … to specify the
language. Author names are entered as free text as well ... so what is
the point of even having an API when the metadata is not well-defined,
-kept.

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge

On Sat, Mar 23, 2024 at 02:05:06PM -0500, Albretch Mueller wrote:
> Actually, in order to deX-Y it in case anyone can offer any help, it
> is more like "I want an index of all the books which have ever been
> written/published" in order to read all of them ;-)

First of all, you will not achieve this goal.  It is not possible for
a human to read every book that has ever been written.  You'll die
before you can even finish a tiny fraction of them.

So, let's say you have a more realistic goal: you want a list of all the
books written by Charles Dickens.

I tried to figure out how to get this out of archive.org but it looks
like their documentation doesn't match their web page.  I started at
 which shows how to
get a list of "items" which all share a common "parent".  I figure
an author might be a reasonable parent.  So then the next question is
how to get the author ID for Charles Dickens.

Next I went to

which tells me I should perform a search on their front page, and
then on the result page, click something called "Media List".

This is where it all falls apart for me.  I can't find a "Media List"
thing to click on.

The documentation also mentions an "ABOUT" that I should be able
to click on to get an Identifier.  Well, that's not a thing I could
find either.  There's an ABOUT link in the top menu bar, which goes to
 which is clearly not what the documentation
was talking about.

All this is far too much of my time wasted trying to help some random
person with an off-topic question on debian-user, so... good luck.

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller

Greg Wooledge via lists.debian.org


>Furthermore, whatever method you are using to *create* this HAR file

>is questionable, since apparently you aren't even getting a properly

>formatted file in the end.


>So, putting these together, it looks like you are taking a file that

>was intended to be used for diagnosing browser/network performance

>issues, and attempting to use this in place of a downloadable index

>of documents from archive.org.


Well, the Chromium HAR log utility has captured that file as a HAR
formatted one of sorts describing the client-server back and forth and
the Linux file utility is telling me it is: "JSON text data". You may
also go:


https://archive.org/search?query=Euklid+OR+Euclid+OR+Euclides%5B%5D=lending%3A%22is_readable%22


to save that page and tell me what can you start with its content.
This is what I mean with hellishly obfuscated "js cr@p" and I can't
understand why archive.org would do that.


>Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?


the sample json file (the HAR file from archive.org) I am using right
now was uploaded file to:


https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt


date

url="https://ergosumus.files.wordpress.com/2024/03/karl_rosenkranz02_ia.har_.odt;

time wget -q --spider --no-verbose --server-response "${url}"; _wgetq=$?

echo "// __ \$_wgetq: |$_wgetq|"


Sat Mar 23 01:39:17 PM CDT 2024

HTTP/1.1 200 OK

Server: nginx

Date: Sat, 23 Mar 2024 18:38:16 GMT

Content-Type: application/vnd.oasis.opendocument.text

Content-Length: 686303

Connection: keep-alive

Last-Modified: Sat, 23 Mar 2024 17:01:03 GMT

Expires: Thu, 18 Apr 2024 19:04:42 GMT

X-Orig-Src: 01_mogdir

X-nc: MISS mdw 24 np

X-Content-Type-Options: nosniff

Alt-Svc: h3=":443"; ma=86400

Accept-Ranges: bytes


real 0m0.582s

user 0m0.080s

sys 0m0.069s

// __ $_wgetq: |0|

~

$ date

Sat Mar 23 11:59:53 AM CDT 2024


$ ls -l Karl_Rosenkranz02_IA.har.*

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.odt

-rw-r--r-- 1 user user 4290474 Mar 21 19:17 Karl_Rosenkranz02_IA.har.txt

-rw-r--r-- 1 user user 686303 Mar 23 11:59 Karl_Rosenkranz02_IA.har.zip


$ file --brief Karl_Rosenkranz02_IA.har.*

Zip archive data, at least v2.0 to extract, compression method=deflate

JSON text data

Zip archive data, at least v2.0 to extract, compression method=deflate


$ file Karl_Rosenkranz02_IA.har.*

Karl_Rosenkranz02_IA.har.odt: Zip archive data, at least v2.0 to
extract, compression method=deflate

Karl_Rosenkranz02_IA.har.txt: JSON text data

Karl_Rosenkranz02_IA.har.zip: Zip archive data, at least v2.0 to
extract, compression method=deflate


$ sha256sum Karl_Rosenkranz02_IA.har.*

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.odt

79dd5a23748db1a7270927b6c16fc28cfff59eaf804ba24b2443da578903ede2
Karl_Rosenkranz02_IA.har.txt

95c2bf849d67b6812193b72fc8504fcab71b49da7937ea8fd9421bee4075ac86
Karl_Rosenkranz02_IA.har.zip

~

or you could:


a) go: https://en.wikipedia.org/wiki/Karl_Rosenkranz

b) click on: Works by or about Karl Rosenkranz (at Internet Archive)

c) on the archive.org page, select "texts" and "always available"
(meaning text which is public domain)

d) open "More Tools" ... as I explained before (with d.5 I meant you
may have to scroll down or use Key press combinations to "manually"
get all records) in Rosenkranz' case I got 169 texts.

~

>This tells me we're deep inside an X-Y problem. The original goal is

>possibly something like "I want an index of all the books about this

>Greek dude". Maybe start from there, and see what answers you get.


Actually, in order to deX-Y it in case anyone can offer any help, it
is more like "I want an index of all the books which have ever been
written/published" in order to read all of them ;-)


Data registries mind their own extant entries. There is no general,
"orbis unum" registry of all texts (generally meant in a philological,
semiological sense: videos, paintings, ...) just the registry not the
extant data. Terribly persuasive silly me tried to explain this idea
to the archive.org folks and they told me off.

What would that registry be good for? Well, let me use self serving
metaphors, some time ago people didn't know how many people lived in
their countries or even their cities, where did the Nile river start,
what an earth map would look like, ... There was a moment in the
history of humankind in which one person could actually have read all
extant literature (at least relating to one culture, say: "natural
philosophy"). Technically it is not so hard, according to google some
130 million books have been printed since the invention of the
printing press. Not that many, anyway. The idea of reading them all
seized me when I was little after reading a one liner by some Perugian
dude (as cannibalized by me):


"the greatest of all gifts and graces that God has granted us with is
the capacity of overcoming oneself".


Now,

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Darac Marjal



On 23/03/2024 16:34, Greg Wooledge wrote:

On Sat, Mar 23, 2024 at 11:55:04AM -0400, Greg Wooledge wrote:

On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:

  1) That HAR file is not properly formatted. Instead of
"attribute":value pairs in the standard way, they have used front
slash + quote pairs (instead of just quotes) erratically all around
the file. That is why you can't use jq.

That is not what I see in the file which I pasted here.

Further investigation:

https://google.com/search?q=what+is+a+HAR+file

   https://www.keycdn.com/support/what-is-a-har-file
   Jan 12, 2023 — A HAR file is primarily used for identifying
   performance issues, such as bottlenecks and slow load times, and page
   rendering problems.

   https://en.wikipedia.org/wiki/HAR_(file_format)
   The HTTP Archive format, or HAR, is a JSON-formatted archive file
   format for logging of a web browser's interaction with a site.
   ...
   This document was never published by the Web Performance Working Group
   and has been abandoned.

So, putting these together, it looks like you are taking a file that
was intended to be used for diagnosing browser/network performance
issues, and attempting to use this in place of a downloadable index
of documents from archive.org.

Furthermore, whatever method you are using to *create* this HAR file
is questionable, since apparently you aren't even getting a properly
formatted file in the end.

This tells me we're deep inside an X-Y problem.  The original goal is
possibly something like "I want an index of all the books about this
Greek dude".  Maybe start from there, and see what answers you get.


If someone was looking to query a Web service programmatically, wouldn't 
the first place to start be seeing if the service has an API?


Archive.org has a well-documented API at 
https://archive.org/developers/. There's even a command-line tool 
(assuming one doesn't want to use, say, the python library).




OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge

On Sat, Mar 23, 2024 at 11:55:04AM -0400, Greg Wooledge wrote:
> On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
> >  1) That HAR file is not properly formatted. Instead of
> > "attribute":value pairs in the standard way, they have used front
> > slash + quote pairs (instead of just quotes) erratically all around
> > the file. That is why you can't use jq.
> 
> That is not what I see in the file which I pasted here.

Further investigation:

https://google.com/search?q=what+is+a+HAR+file

  https://www.keycdn.com/support/what-is-a-har-file
  Jan 12, 2023 — A HAR file is primarily used for identifying
  performance issues, such as bottlenecks and slow load times, and page
  rendering problems.

  https://en.wikipedia.org/wiki/HAR_(file_format)
  The HTTP Archive format, or HAR, is a JSON-formatted archive file
  format for logging of a web browser's interaction with a site.
  ...
  This document was never published by the Web Performance Working Group
  and has been abandoned.

So, putting these together, it looks like you are taking a file that
was intended to be used for diagnosing browser/network performance
issues, and attempting to use this in place of a downloadable index
of documents from archive.org.

Furthermore, whatever method you are using to *create* this HAR file
is questionable, since apparently you aren't even getting a properly
formatted file in the end.

This tells me we're deep inside an X-Y problem.  The original goal is
possibly something like "I want an index of all the books about this
Greek dude".  Maybe start from there, and see what answers you get.

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Greg Wooledge

On Sat, Mar 23, 2024 at 09:54:05AM -0500, Albretch Mueller wrote:
>  a) using a chromium-derived browser, which can be used to dump the
> HAR file log of the network back and forth, go, e. g.:
>   https://en.wikipedia.org/wiki/Anaxagoras
>  b) click on the link that says: "Works by or about Anaxagoras" (at
> Internet Archive)
>  c) on the archive.org page, select "texts" and "always available"
> (meaning text which is public domain, he died 25 centuries ago)
>  d) then to produce the HAR file, go:
>  d.1) More Tools > Developer Tools;
>  d.2) click on "Network" tab;
>  d.3) Filter: GET
>  d.4) check: "Preserve Log"
>  d.5) scroll down the page all the way to make the client-server back
> and forth cascade
>  d.6) save the network log as HAR file to then open and eyeball it!

This is incomprehensible to me.  What the hell is d.5 supposed to be?
Even if I close the Shift-Ctrl-I window, and Ctrl-R to reload the page,
and then reopen Shift-Ctrl-I, and click the down-arrow-in-a-dish icon
whose tooltip says "Export HAR..." all I get in the resulting file
is this:

hobbit:~$ cat Downloads/archive.org.har 
{
  "log": {
"version": "1.2",
"creator": {
  "name": "WebInspector",
  "version": "537.36"
},
"pages": [],
"entries": []
  }
}hobbit:~$ 

Do you have one of these HAR files in a *DIRECTLY DOWNLOADABLE URL*?
Something that doesn't take 12 manual steps that are impossible to
perform?

Or can you *attach* one to a message to this mailing list?  Make sure
it's small.

>  1) That HAR file is not properly formatted. Instead of
> "attribute":value pairs in the standard way, they have used front
> slash + quote pairs (instead of just quotes) erratically all around
> the file. That is why you can't use jq.

That is not what I see in the file which I pasted here.

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread Albretch Mueller

>On Sat, Mar 23, 2024 at 1:44 AM  wrote:
>> On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
>> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
>> nonsense I was able to extract line looking like:

>It's not "js cr@p", It is called JSON. And there's a spec for
>it.

 Well, I am old enough to remember when JSON meant: "JavaScript Object
Notation" in the form of human-readable attribute:value text files.

 a) using a chromium-derived browser, which can be used to dump the
HAR file log of the network back and forth, go, e. g.:
  https://en.wikipedia.org/wiki/Anaxagoras
 b) click on the link that says: "Works by or about Anaxagoras" (at
Internet Archive)
 c) on the archive.org page, select "texts" and "always available"
(meaning text which is public domain, he died 25 centuries ago)
 d) then to produce the HAR file, go:
 d.1) More Tools > Developer Tools;
 d.2) click on "Network" tab;
 d.3) Filter: GET
 d.4) check: "Preserve Log"
 d.5) scroll down the page all the way to make the client-server back
and forth cascade
 d.6) save the network log as HAR file to then open and eyeball it!

>> I have tried substring substitution, sed et tr to no avail.
>You might have a lot of fun trying to parse JSON with sed and
>tr.

 1) That HAR file is not properly formatted. Instead of
"attribute":value pairs in the standard way, they have used front
slash + quote pairs (instead of just quotes) erratically all around
the file. That is why you can't use jq.
 2) since they (archive.org) have been changing the format they use on
their pages (to avoid html scrappers?), I don't try to make sense of
what they do. I would just use quick hacks and "keep moving".
 2.a) make editing copy of the file
 2.b) using sed I would parse out the lines with the data I need:
  sed --in-place --expression
's/{\\"index\\":\\"/\n{\\"index\\":\\"/g' ""
 2.c) once you extract them, you then need to parse the fields for
post processing.

 I have tried substring substitution, sed et tr to first replace all
front slash + quote pairs into quotes to then be able to use jq in the
happy way you should. I haven't been successful (is that the reason
why they obfuscate their pages in that way?)

 lbrtchx

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread David Christensen


On 3/22/24 22:53, Albretch Mueller wrote:

out of a HAR file containing lots of obfuscating js cr@p and all kinds of
nonsense I was able to extract line looking like:

var00='{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\",
\"Mr. ABC123\"],\"collection\":[\"europeanlibraries\",
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}'
echo "// __ \$var00: |$var00|"

The final result that I need would look like:
o
var02='bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl
Rosenkranz", "Mr. ABC123"]|["europeanlibraries",
"americana"]|1843|["German"]|797368506|[50.629513]'
echo "// __ \$var02: |$var02|"

I have tried substring substitution, sed et tr to no avail.

lbrtchx



My daily driver:

2024-03-23 04:02:27 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat /etc/debian_version; uname -a; perl -v | head -n 2 | grep .
11.9
Linux laalaa 5.10.0-28-amd64 #1 SMP Debian 5.10.209-2 (2024-01-31) 
x86_64 GNU/Linux
This is perl 5, version 32, subversion 1 (v5.32.1) built for 
x86_64-linux-gnu-thread-multi



Put the JSON into a data file, one record per line (my mailer is 
line-wrapping data.json -- it contains two lines):


2024-03-23 04:22:20 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat data.json
{"index":"prod-h-006","fields":{"identifier":"bub_gb_O2EAMAAJ","title":"Die 
Wissenschaft vom subjectiven Geist","creator":["Karl Rosenkranz", "Mr. 
ABC123"],"collection":["europeanlibraries", 
"americana"],"year":1843,"language":["German"],"item_size":797368506},"_score":[50.629513]}
{"index":"prod-h-007","fields":{"identifier":"abc_de_12FGHIJKLMNO","title":"My 
Title","creator":["Some Body", "Somebody 
Else"],"collection":["europeanlibraries", 
"americana"],"year":2024,"language":["English"],"item_size":1234567890},"_score":[12.345678]}



A Perl script to read newline-delimited JSON records and pretty print each:

2024-03-23 04:28:59 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat munge-json
#!/usr/bin/perl
# $Id: munge-json,v 1.3 2024/03/23 11:28:58 dpchrist Exp $
# Refer to debian-user 3/22/24 22:53 Albretch Mueller
# "trying to parse lines from an awkwardly formatted HAR file"
# by David Paul Christensen dpchr...@holgerdanske.com
# Public Domain
use strict;
use warnings;
use Data::Dumper;
use JSON;
use Getopt::Long;
$Data::Dumper::Sortkeys = 1;
my $debug;
GetOptions('debug|d' => \$debug) or die;
while (<>) {
my $rh = decode_json $_;
print Data::Dumper->Dump([$rh], [qw(rh)]) if $debug;
print
join('|',
$rh->{fields}{identifier},
$rh->{fields}{title},
'["' .  join('", "', @{$rh->{fields}{creator}}) . '"]',
'["' .  join('", "', @{$rh->{fields}{collection}}) . '"]',
$rh->{fields}{year},
'["' .  join('", "', @{$rh->{fields}{language}}) . '"]',
$rh->{fields}{item_size},
'[' .  join(', ', @{$rh->{_score}}) . ']',
), "\n";
}   


Run the script as a Unix filter:

2024-03-23 04:30:16 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ ./munge-json data.json
bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl 
Rosenkranz", "Mr. ABC123"]|["europeanlibraries", 
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody 
Else"]|["europeanlibraries", 
"americana"]|2024|["English"]|1234567890|[12.345678]


2024-03-23 04:30:18 dpchrist@laalaa 
~/sandbox/perl/debian-users/20240322-2253-albretch-mueller

$ cat data.json | ./munge-json
bub_gb_O2EAMAAJ|Die Wissenschaft vom subjectiven Geist|["Karl 
Rosenkranz", "Mr. ABC123"]|["europeanlibraries", 
"americana"]|1843|["German"]|797368506|[50.629513]
abc_de_12FGHIJKLMNO|My Title|["Some Body", "Somebody 
Else"]|["europeanlibraries", 
"americana"]|2024|["English"]|1234567890|[12.345678]



David

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread mgr...@grant.org

Here's a hint at a start of what you need to do, it should be pretty easy to 
extend this, if it's unclear, let me know:

for starters, run your "gunk" into jq like this:

$ echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq
{
  "index": "prod-h-006",
  "fields": {
"identifier": "bub_gb_O2EAMAAJ",
"title": "Die Wissenschaft vom subjectiven Geist",
"creator": [
  "Karl Rosenkranz",
  "Mr. ABC123"
],
"collection": [
  "europeanlibraries",
  "americana"
],
"year": 1843,
"language": [
  "German"
],
"item_size": 797368506
  },
  "_score": [
50.629513
  ]
}

then, start building your output like this:

echo 
{\"index\":\"prod-h-006\",\"fields\":{\"identifier\":\"bub_gb_O2EAMAAJ\",\"title\":\"Die
 Wissenschaft vom subjectiven Geist\",\"creator\":[\"Karl Rosenkranz\", \"Mr. 
ABC123\"],\"collection\":[\"europeanlibraries\", 
\"americana\"],\"year\":1843,\"language\":[\"German\"],\"item_size\":797368506},\"_score\":[50.629513]}
 | jq '.fields.identifier + "|" + .fields.title'

jq is an amazing tool, it's a full fledged programming language.  You just need 
to continue concatenating your desired output.  You might even find you can do 
what you want all inside a jq script instead of what you're doing.  Consider 
writing a jq script with the first line of the script #!/usr/bin/jq

Hope this gets you on the right path!

Michael Grant

From: to...@tuxteam.de
Sent: Friday, March 22, 2024 23:44
To: Albretch Mueller
Cc: debian-user
Subject: Re: trying to parse lines from an awkwardly formatted HAR file ...

On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
> nonsense I was able to extract line looking like:

It's not "js cr@p", It is called JSON. And there's a spec for
it.

[...]

> I have tried substring substitution, sed et tr to no avail.

You might have a lot of fun trying to parse JSON with sed and
tr.

If you are serious about it, you should try a proper parser
and extractor. I'd recommend jq [1], available in Debian under
the same-named package. I have written a few shell scripts
reaching into the innards of

You'll have to wrap your brain around it, but in the time you
have implemented a parser for js in "sed and tr" (you might
need a dash of "proper programming language" around that, some
luck and a ton of elbow grease) you might have wrapped your
brain like 16 times around jq (or some other appropriate tool).

Cheers
--
tomás

Re: trying to parse lines from an awkwardly formatted HAR file ...

2024-03-23 Thread tomas

On Sat, Mar 23, 2024 at 12:53:24AM -0500, Albretch Mueller wrote:
> out of a HAR file containing lots of obfuscating js cr@p and all kinds of
> nonsense I was able to extract line looking like:

It's not "js cr@p", It is called JSON. And there's a spec for
it.

[...]

> I have tried substring substitution, sed et tr to no avail.

You might have a lot of fun trying to parse JSON with sed and
tr.

If you are serious about it, you should try a proper parser
and extractor. I'd recommend jq [1], available in Debian under
the same-named package. I have written a few shell scripts
reaching into the innards of 

You'll have to wrap your brain around it, but in the time you
have implemented a parser for js in "sed and tr" (you might
need a dash of "proper programming language" around that, some
luck and a ton of elbow grease) you might have wrapped your
brain like 16 times around jq (or some other appropriate tool).

Cheers
-- 
tomás

signature.asc
Description: PGP signature

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

Re: trying to parse lines from an awkwardly formatted HAR file ...

11 matches

Site Navigation

Mail list logo

Footer information