Rodrigo Arias wrote:
> On Mon, Dec 30, 2024 at 05:35:50PM +0100, 
> a1ex-j7k0xvabl0ielga04la...@public.gmane.org wrote:
>> There was an interesting post[1] on HN today about 'curl-impersonate',
>> which is a patch[2] to curl which allows it to act like various big
>> browsers, bypassing various fingerprinting techniques which would
>> otherwise prevent the client from accessing the page.
>> 
>> Looking at the patch, maybe there could be some useful ideas here for
>> Dillo to use to load more sites. The SSL library also obviously plays a
>> large role, maybe that's something we will need to consider as well.
> 
> I experienced problems with the user-agent being banned, and having to 
> impersonate Firefox to load some sites. I haven't found yet examples of 
> this deep fingerprinting for TLS or similar, you?
> 
> In any case, it would be trivial to discern Dillo as we don't support 
> JS, so it can be banned if they decide so.

I've found that sometimes I go to a webpage and see one of the
"enable Javascript to continue" pages in Dillo, then I load the
same page in Firefox with NoScript blocking all its scripts, and
it comes up fine without running any such Javascript. That could
be just the User-Agent header though because I don't try faking
that.

Rather than add Chrome-faking features to Dillo, maybe this would
be an extra application of the Rule-based content manipulation RFC:
https://github.com/dillo-browser/rfc/blob/rfc-002/rfc-002-rule-based-content-manipulation.md

Make a rule for some sites (or Web server responses?) that has
Dillo call curl-impersonate to retrieve a Web page instead of doing
it in Dillo?

By the way, being a Git failure, I really can't see where that MD
document lives. I look at the "rfc" repo via the GitHub website in
Dillo and there's just a readme. I clone the repo and I just get a
readme. I had to look back to your RFC repo announcement to find
that link. I guess they're in separate branches or something but I
forget things about Git faster than I learn them and can't be
bothered learning how to use branches yet again today. I really
think it would be better to list them together somewhere obvious,
eg. a new Developer Documentation webpage.

I can see from this URL mangling that there are probably only two
RFCs so far:
https://github.com/dillo-browser/rfc/tree/rfc-001/ 
(rfc-001-dillo-rfc-documents.md)
https://github.com/dillo-browser/rfc/tree/rfc-002/ 
(rfc-002-rule-based-content-manipulation.md)
https://github.com/dillo-browser/rfc/tree/rfc-003/ (404)

> In my experiences, it is generally not worth reading the website
> that performs this type of discrimination.

That's often my approach, but then big offenders are things like
government websites which one is obliged to read sometimes.
_______________________________________________
Dillo-dev mailing list -- dillo-dev@mailman3.com
To unsubscribe send an email to dillo-dev-le...@mailman3.com

Reply via email to