Yes, you're right, it's a kind of combination of these tools. I see your concerns about the generated rules and since you're heading towards a perfectly working ruleset I understand that you don't want them to be included.
If you'd like to have the results of my evaluation of different similarity matching algorithms for improving the https-everywhere-checker tool, I will gladly provide my thesis as soon as I've finished writing. KR, Dominik On 2015-03-12 19:12, Maxim Nazarenko wrote: > As far as I understand, the project looks like HTTPS-Finder and > https-everywhere-checker combined. Dominik, is that right? Writing > rules is more art than science, I am afraid, and therefore I share > Numismatika's concerns, but what is interesting is detecting whether > http and https versions are essentially the same. Right now > https-everywhere-checker has two different metrics, with more or less > arbitrary threshold, please correct me if I am wrong. Some statistical > data on what method is "better" and what threshold is "reasonable" may > be interesting, IMHO. > > Best regards, > Maxim Nazarenko > > On 12 March 2015 at 12:54, Numismatika > <[email protected]> wrote: >> I do not really trust the quality of rules that were generated without >> any human interaction. >> I think the better approach is to have something like >> https://github.com/kevinjacobs/HTTPS-Finder/. >> A setting that the addon should notify you if it detects unsecured >> content that is available over TLS . >> If you go to fix up the above mentioned addon and solve the >> shortcomings, i think we benefit more than we >> would from a few thousand rules without any quality assurance by a human >> eye. >> >> Numismatika >> >> Am 12.03.2015 um 10:19 schrieb Dominik Frühwirt: >>> Hi, >>> >>> I'm currently finishing my master thesis in computer science which >>> addresses the ruleset of HTTPS-E. Simply put, I try to generate rules >>> for a large amount of websites automatically. Therefore, an automated >>> browser (PhantomJS) has been utilized for fetching the HTML source of >>> the HTTP websites and corresponding HTTPS websites. These are found by >>> trying to reach the most frequently used subdomains of HTTPS secured >>> domains. The retrieved sources are compared by using different >>> similarity matching algorithms and treated as positive match when the >>> calculated similarity value exceeds a certain threshold. >>> >>> I took Alexa's top million websites as an input and generated about >>> 89,000 rules out of it. Since only the landing pages are compared, >>> particularities like resources that are available on a certain path via >>> HTTP but not via HTTPS are not considered (no exclusion patterns). >>> Generally, the generated rules are not as accurate as the community >>> written ones. Hence, manually created rules should not be overruled when >>> merging the generated rules into the current ruleset. >>> >>> One problem resulting from such a large ruleset is the browser's UI. >>> When I select "Enable / Disable Rules" in the menu of the extension >>> Firefox stops working and freezes completely probably because it tries >>> to load the whole ruleset into the dialog's list. This problem should be >>> solved before including a large set of rules. >>> >>> Are you interested in incorporating the generated rules into the public >>> ruleset? >>> >>> Kind Regards >>> Dominik Frühwirt >>> _______________________________________________ >>> HTTPS-Everywhere mailing list >>> [email protected] >>> https://lists.eff.org/mailman/listinfo/https-everywhere >> >> >> >> _______________________________________________ >> HTTPS-Everywhere mailing list >> [email protected] >> https://lists.eff.org/mailman/listinfo/https-everywhere > _______________________________________________ > HTTPS-Everywhere mailing list > [email protected] > https://lists.eff.org/mailman/listinfo/https-everywhere > _______________________________________________ HTTPS-Everywhere mailing list [email protected] https://lists.eff.org/mailman/listinfo/https-everywhere
