As far as I understand, the project looks like HTTPS-Finder and https-everywhere-checker combined. Dominik, is that right? Writing rules is more art than science, I am afraid, and therefore I share Numismatika's concerns, but what is interesting is detecting whether http and https versions are essentially the same. Right now https-everywhere-checker has two different metrics, with more or less arbitrary threshold, please correct me if I am wrong. Some statistical data on what method is "better" and what threshold is "reasonable" may be interesting, IMHO.
Best regards, Maxim Nazarenko On 12 March 2015 at 12:54, Numismatika <[email protected]> wrote: > I do not really trust the quality of rules that were generated without > any human interaction. > I think the better approach is to have something like > https://github.com/kevinjacobs/HTTPS-Finder/. > A setting that the addon should notify you if it detects unsecured > content that is available over TLS . > If you go to fix up the above mentioned addon and solve the > shortcomings, i think we benefit more than we > would from a few thousand rules without any quality assurance by a human > eye. > > Numismatika > > Am 12.03.2015 um 10:19 schrieb Dominik Frühwirt: >> Hi, >> >> I'm currently finishing my master thesis in computer science which >> addresses the ruleset of HTTPS-E. Simply put, I try to generate rules >> for a large amount of websites automatically. Therefore, an automated >> browser (PhantomJS) has been utilized for fetching the HTML source of >> the HTTP websites and corresponding HTTPS websites. These are found by >> trying to reach the most frequently used subdomains of HTTPS secured >> domains. The retrieved sources are compared by using different >> similarity matching algorithms and treated as positive match when the >> calculated similarity value exceeds a certain threshold. >> >> I took Alexa's top million websites as an input and generated about >> 89,000 rules out of it. Since only the landing pages are compared, >> particularities like resources that are available on a certain path via >> HTTP but not via HTTPS are not considered (no exclusion patterns). >> Generally, the generated rules are not as accurate as the community >> written ones. Hence, manually created rules should not be overruled when >> merging the generated rules into the current ruleset. >> >> One problem resulting from such a large ruleset is the browser's UI. >> When I select "Enable / Disable Rules" in the menu of the extension >> Firefox stops working and freezes completely probably because it tries >> to load the whole ruleset into the dialog's list. This problem should be >> solved before including a large set of rules. >> >> Are you interested in incorporating the generated rules into the public >> ruleset? >> >> Kind Regards >> Dominik Frühwirt >> _______________________________________________ >> HTTPS-Everywhere mailing list >> [email protected] >> https://lists.eff.org/mailman/listinfo/https-everywhere > > > > _______________________________________________ > HTTPS-Everywhere mailing list > [email protected] > https://lists.eff.org/mailman/listinfo/https-everywhere _______________________________________________ HTTPS-Everywhere mailing list [email protected] https://lists.eff.org/mailman/listinfo/https-everywhere
