hi.... this may or may not be top posted. what u need to do when scraping/crawling is to essentially replicate what the browser/server dies during a transaction.
in some cases, this will mean examining the handshake/traffic which can be implied by the developer tabs. in other cases, ur going to have to be clever to handle browser/javascript functionality which can be a browser extension that runs within the browser. this is beyond simple curled. good luck On Mon, Oct 20, 2025, 5:40 AM ToddAndMargo via curl-users < [email protected]> wrote: > > > On Sat, Oct 18, 2025 at 9:51 AM ToddAndMargo via curl-users > > <[email protected]> wrote: > >> > >> On 10/18/25 6:44 AM, ToddAndMargo via curl-users wrote: > >>>>> On Sat, Oct 18, 2025 at 1:22 PM ToddAndMargo via curl-users <curl- > >>>>> [email protected] <mailto:[email protected]>> wrote: > >>>>> > >>>>> On 10/18/25 3:06 AM, Daniel Stenberg wrote: > >>>>> > On Sat, 18 Oct 2025, ToddAndMargo via curl-users wrote: > >>>>> > > >>>>> >> How do I get around "You've been blocked" on this web sire: > >>>>> > > >>>>> > Presumably you get blocked by a site if you somehow violate > their > >>>>> terms > >>>>> > or use or their perception of good behavior. > >>>>> > > >>>>> > A primary way to not get blocked would be to not do that. To > >>>>> understand > >>>>> > the exact specifics and reasons, you would have to ask the > admins > >>>>> of the > >>>>> > website in question. > >>>>> > > >>>>> > >>>>> I am not doing anything different than I ever do. > >>>>> Do you see anything wrong with my code? > >>>>> -- Unsubscribe: > https://lists.haxx.se/mailman/listinfo/curl- > >>>>> users > >>>>> <https://lists.haxx.se/mailman/listinfo/curl-users> > >>>>> Etiquette: https://curl.se/mail/etiquette.html < > https://curl.se/ > >>>>> mail/etiquette.html> > >>>>> > >>>>> > >>> > >>> On 10/18/25 6:29 AM, Bastian Jesuiter via curl-users wrote: > >>>> You didnt even post code. > >>> > >>> From my original post: > >>> > >>> curl -L https://www.softpedia.com/get/System/Back-Up-and-Recovery/ > >>> Icedrive.shtml#download -o eraseme.html > >>> > >>>> But regardless of that - if you get blocked, you'll need to resolve > >>>> that with the admin of the page. > >>>> Maybe you can even ask (them) if you may get an API Documentation for > >>>> Developers and or explain what your use case is and how you should > >>>> proceed. > >>>> Some websites are explicitly custom fetching, others - like the > >>>> icedrive website you shared recently - do not. > >>>> > >>>> You are most likely being catched by an AI Crawler blocker. Presumably > >>>> because your requests are similar to the behavior of an AI or > >>>> otherwise automated scraper. > >>>> Dont do that. > >>> > >>> Curl does that? > >>> > >>>> Try to find dev documentation or ask the service admin as Daniel said. > >>>> > >>>> --- > >>>> Bastian > >>> > >>> The web page that I get states: > >>> > >>> What can I do to resolve this? > >>> > >>> You can email the site owner to let them know you > >>> were blocked. Please include what you were doing > >>> when this page came up and the Cloudflare Ray ID > >>> found at the bottom of this page. > >>> > >>> Cloudflare Ray ID: 990732037c51c798 • Your IP: > >>> • Performance & security by Cloudflare > >>> > >>> And there is noting at the bottom of the page. > >>> > >> > >> > >> I just wrote Softpedia > > And got ghosted. > On 10/18/25 9:48 AM, bruce via curl-users wrote: > > Hi. > > > > Lurking/saw this thread. I have no clue as to your level of expertise. > > I don't know what browser you're using as a test. It does appear > > you've used some browser, as you can get to the target page. > > > > As a test, use the browser/url, and the "developer" tabs to find the > > "curl" function as defined by the browser. This would get the complete > > attributes used in the command. > > > > Do this, test it if you find the browser cmd, and post the command if > > it's not working... > > > > good luck > > > > Hi Bruce, > > My level of expertise with cURL is a beginner. I know > enough to know when I need to ask for help. > > Okay, in Firefox, Web Tools, Web development kit) could not > find "developer" tab), Network I find a single "GET" > > Right clicking and copy as cURL, I get > > curl -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) > AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 > Safari/537.36" https://icedrive.net/apps/desktop-laptop > > This gets me the dreaded > > <!DOCTYPE html><html lang="en-US"><head><title>Just a > moment...</title> ... > > If I click on "Debugger", I get a tree on the left column" > --> Main thread > --> icedrive.net > --> <anonymous code> > --> desktop laptop > And if I click on "desktop laptop" it has exactly what I need. > > If I right click on it and copy source URI, I get > https://icedrive.net/apps/desktop-laptop > AAAHHHH!!!! > > > For fun, I tried: > > curl -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) > Gecko/20100101 Firefox/71.0" \ > -H "Accept: > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" \ > -H "Accept-Language: en-US,en;q=0.5" \ > -H "Accept-Encoding: gzip, deflate" \ > -H "Upgrade-Insecure-Requests: 1" \ > -H "Connection: keep-alive" \ > https://icedrive.net/apps/desktop-laptop > > and got some kind of binary file with no strings. > > > So I tried: > > curl -H "Accept: > text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" > > \ > -H "Accept-Language: en-US,en;q=0.8,en-GB;q=0.6,es;q=0.4" \ > -H "Referer: https://www.google.com/" \ > -H "Cache-Control: max-age=0" \ > -A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 > (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36" \ > "https://icedrive.net/apps/desktop-laptop" -o eraseme.html > > And that worked. Yippee!!! :-) > > And I have no idea why. :'( > > Thank you for the help, > -T > > > > > > > -- > Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users > Etiquette: https://curl.se/mail/etiquette.html >
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-users Etiquette: https://curl.se/mail/etiquette.html
