hi....
this may or may not be top posted.

what u need to do when scraping/crawling is to essentially replicate what
the browser/server dies during a transaction.

in some cases, this will mean examining the handshake/traffic   which can
be implied by the developer tabs.  in other cases, ur going to have to be
clever to handle browser/javascript functionality which can be a browser
extension that runs within the browser.  this is beyond simple curled.

good luck


On Mon, Oct 20, 2025, 5:40 AM ToddAndMargo via curl-users <
[email protected]> wrote:

>
> > On Sat, Oct 18, 2025 at 9:51 AM ToddAndMargo via curl-users
> > <[email protected]> wrote:
> >>
> >> On 10/18/25 6:44 AM, ToddAndMargo via curl-users wrote:
> >>>>> On Sat, Oct 18, 2025 at 1:22 PM ToddAndMargo via curl-users <curl-
> >>>>> [email protected] <mailto:[email protected]>> wrote:
> >>>>>
> >>>>>      On 10/18/25 3:06 AM, Daniel Stenberg wrote:
> >>>>>       > On Sat, 18 Oct 2025, ToddAndMargo via curl-users wrote:
> >>>>>       >
> >>>>>       >> How do I get around "You've been blocked" on this web sire:
> >>>>>       >
> >>>>>       > Presumably you get blocked by a site if you somehow violate
> their
> >>>>>      terms
> >>>>>       > or use or their perception of good behavior.
> >>>>>       >
> >>>>>       > A primary way to not get blocked would be to not do that. To
> >>>>>      understand
> >>>>>       > the exact specifics and reasons, you would have to ask the
> admins
> >>>>>      of the
> >>>>>       > website in question.
> >>>>>       >
> >>>>>
> >>>>>      I am not doing anything different than I ever do.
> >>>>>      Do you see anything wrong with my code?
> >>>>>      --     Unsubscribe:
> https://lists.haxx.se/mailman/listinfo/curl-
> >>>>> users
> >>>>>      <https://lists.haxx.se/mailman/listinfo/curl-users>
> >>>>>      Etiquette: https://curl.se/mail/etiquette.html <
> https://curl.se/
> >>>>>      mail/etiquette.html>
> >>>>>
> >>>>>
> >>>
> >>> On 10/18/25 6:29 AM, Bastian Jesuiter via curl-users wrote:
> >>>> You didnt even post code.
> >>>
> >>>   From my original post:
> >>>
> >>> curl -L https://www.softpedia.com/get/System/Back-Up-and-Recovery/
> >>> Icedrive.shtml#download -o eraseme.html
> >>>
> >>>> But regardless of that - if you get blocked, you'll need to resolve
> >>>> that with the admin of the page.
> >>>> Maybe you can even ask (them) if you may get an API Documentation for
> >>>> Developers and or explain what your use case is and how you should
> >>>> proceed.
> >>>> Some websites are explicitly custom fetching, others - like the
> >>>> icedrive website you shared recently - do not.
> >>>>
> >>>> You are most likely being catched by an AI Crawler blocker. Presumably
> >>>> because your requests are similar to the behavior of an AI or
> >>>> otherwise automated scraper.
> >>>> Dont do that.
> >>>
> >>> Curl does that?
> >>>
> >>>> Try to find dev documentation or ask the service admin as Daniel said.
> >>>>
> >>>> ---
> >>>> Bastian
> >>>
> >>> The web page that I get states:
> >>>
> >>>         What can I do to resolve this?
> >>>
> >>>         You can email the site owner to let them know you
> >>>         were blocked. Please include what you were doing
> >>>         when this page came up and the Cloudflare Ray ID
> >>>         found at the bottom of this page.
> >>>
> >>>         Cloudflare Ray ID: 990732037c51c798 • Your IP:
> >>>         • Performance & security by Cloudflare
> >>>
> >>> And there is noting at the bottom of the page.
> >>>
> >>
> >>
> >> I just wrote Softpedia
>
> And got ghosted.
> On 10/18/25 9:48 AM, bruce via curl-users wrote:
>  > Hi.
>  >
>  > Lurking/saw this thread. I have no clue as to your level of expertise.
>  > I don't know what browser you're using as a test. It does appear
>  > you've used some browser, as you can get to the target page.
>  >
>  > As a test, use the browser/url, and the "developer" tabs to find the
>  > "curl" function as defined by the browser. This would get the complete
>  > attributes used in the command.
>  >
>  > Do this, test it if you find the browser cmd, and post the command if
>  > it's not working...
>  >
>  > good luck
>  >
>
> Hi Bruce,
>
> My level of expertise with cURL is a beginner.  I know
> enough to know when I need to ask for help.
>
> Okay, in Firefox, Web Tools, Web development kit) could not
> find "developer" tab), Network I find a single "GET"
>
> Right clicking and copy as cURL, I get
>
> curl -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
> AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182
> Safari/537.36" https://icedrive.net/apps/desktop-laptop
>
> This gets me the dreaded
>
>      <!DOCTYPE html><html lang="en-US"><head><title>Just a
> moment...</title> ...
>
> If I click on "Debugger", I get a tree on the left column"
>     --> Main thread
>       --> icedrive.net
>         --> <anonymous code>
>         --> desktop laptop
> And if I click on "desktop laptop" it has exactly what I need.
>
> If I right click on it and copy source URI, I get
>       https://icedrive.net/apps/desktop-laptop
> AAAHHHH!!!!
>
>
> For fun, I tried:
>
> curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0)
> Gecko/20100101 Firefox/71.0" \
>       -H "Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \
>       -H "Accept-Language: en-US,en;q=0.5" \
>       -H "Accept-Encoding: gzip, deflate" \
>       -H "Upgrade-Insecure-Requests: 1" \
>       -H "Connection: keep-alive" \
> https://icedrive.net/apps/desktop-laptop
>
> and got some kind of binary file with no strings.
>
>
> So I tried:
>
> curl -H "Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
>
> \
>       -H "Accept-Language: en-US,en;q=0.8,en-GB;q=0.6,es;q=0.4" \
>       -H "Referer: https://www.google.com/"; \
>       -H "Cache-Control: max-age=0" \
>       -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
> (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" \
>       "https://icedrive.net/apps/desktop-laptop"; -o eraseme.html
>
> And that worked.  Yippee!!!  :-)
>
> And I have no idea why.  :'(
>
> Thank you for the help,
> -T
>
>
>
>
>
>
> --
> Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
> Etiquette:   https://curl.se/mail/etiquette.html
>
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users
Etiquette:   https://curl.se/mail/etiquette.html

Reply via email to