A few years ago I discovered Firefox Copy as cURL.  It solved a problem I
had scraping data from
https://www.trustnet.com/fund/price-performance/t/investment-trusts?norisk=true&PageSize=25

However Copy as cURL is dangerous because you are pasting something
received over the Internet
into your Command Prompt window (or the equivalent for other operating
systems).
There is a danger that curl receives only part of the data, with the
remainder being processed
by cmd.exe, running some arbitrary program.  People have been trying to
prevent this happening
for several years by adding escape characters, but the latest release of
Firefox (141.0) still
fails to prevent it.  The problem seems to be fixed in a recent Firefox
Nightly.

I used a program to issue a curl command for each page (changing the page
number for each).
The Trustnet curl command was sometimes too large to be pasted into my
program, so I read it
directly from the clipboard using GetClipboardData.  This meant that I had
to strip the escape
characters before calling curl.

My solution to this problem is in two parts:

   1. Don't use Copy as cURL.  They keep changing it, and every time they
change it I need to
      change my program.  Use Save all as HAR.  Then use a program to
convert the HAR file to
      curl options.  A Google search will find lots of them, or write your
own.

   2. Pass the curl options to curl using –K to read the options from a
file, so that cmd.exe
      never sees them.

So I've written a program to convert the HAR file to curl options and call
curl for each page
of the Trustnet site.  I was surprised how easy this was.  I would like to
thank the authors of
https://github.com/json-parser/json-parser for providing a JSON parser
which is very easy to use.

I would be interested to hear about the experiences of other users of Copy
as cURL or any
alternative solutions.

Paul
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to