Re: Let's launch our own blocklists...

Michael Tremer Sun, 25 Jan 2026 06:41:04 -0800

Hello Matthias,

Nice catch!


I fixed it here and added the missing “;”:

  
https://git.ipfire.org/?p=dbl.git;a=commitdiff;h=775561e322ceed43e255e5547bd76047b9f8a40b

If you go to the provider settings there is a button to force a ruleset update 
which should give you the fixed version. Please let me know if this works.

Best,
-Michael

> On 24 Jan 2026, at 23:41, Matthias Fischer <[email protected]> 
> wrote:
> 
> On 23.01.2026 17:39, Michael Tremer wrote:
>> Hello Matthias,
> 
> Hi Michael,
> 
>> Thank you very much for testing IPFire DBL.
> 
> No problem - I have news:
> 
> After taking a closer look to the IPS system logs, unfortunately I found
> some parsing errors:
> 
> 'suricata' complains about missing ";".
> 
> ***SNIP***
> ...
> 00:32:40 suricata: [13343] <Info> -- Including configuration file
> /var/ipfire/suricata/suricata-used-rulesfiles.yaml.
> 00:32:40 suricata: [13343] <Error> -- no terminating ";" found
> 00:32:40 suricata: [13343] <Error> -- error parsing signature "drop
> dns any any -> any any (msg:"IPFire DBL [Advertising] Blocked DNS
> Query"; dns.query; domain; dataset:isset,ads,type string,load
> datasets/ads.txt; classtype:policy-violation; priority:3; sid:983041;
> rev:1; reference:url,https://www.ipfire.org/dbl/ads; metadata:dbl
> ads.dbl.ipfire.org)" from file /var/lib/suricata/ipfire_dnsbl-ads.rules
> at line 72
> 00:32:40 suricata: [13343] <Error> -- no terminating ";" found
> ...
> ***SNAP***
> 
> I tried, but didn't find the right place for any missing ";".
> 
> Can "anyone" confirm?
> 
> Best
> Matthias
> 
>>> On 23 Jan 2026, at 15:02, Matthias Fischer <[email protected]> 
>>> wrote:
>>> 
>>> On 22.01.2026 12:33, Michael Tremer wrote:
>>>> Hello everyone,
>>> 
>>> Hi,
>>> 
>>> short feedback from me:
>>> 
>>> - I activated both the suricata (IPFire DBL - Domain Blocklist) - and
>>> the URLfilter lists from 'dbl.ipfire.org'.
>> 
>> This is an interesting case. What I didn’t manage to test yet is what 
>> happens when Suricata blocks the connection first. If URL Filter sees a 
>> domain that is being blocked it will either send you an error page if you 
>> are using HTTP, or simply close the connection if it is HTTPS. However, when 
>> Suricata comes first in the chain (and it will), it might close the 
>> connection because URL Filter has received the request. In the case of HTTPS 
>> this does not make any difference because the connection will be closed, but 
>> in the HTTP case you won’t see an error page any more and instead have the 
>> connection closed, too. You are basically losing the explicit error 
>> notification which is a little bit annoying.
>> 
>> We could have the same when we are doing the same with Unbound and DNS 
>> filtering. Potentially we would need to whitelist the local DNS resolver 
>> then, but how is Suricata supposed to know that the same categories are 
>> activated in both places?
>> 
>>> - I even took the 'smart-tv' domains from the IFire DBL blacklist and
>>> copied/pasted them in my fritzbox filter lists.
>> 
>> LOL Why not use IPFire to filter this as well?
>> 
>>> Everything works as expected. Besides, the download of the IPFire
>>> DBL-list loads a lot faster than the list from 'Univ. Toulouse'... ;-)
>> 
>> Yes, we don’t have much traffic on the server, yet.
>> 
>>> Functionality is good - no false positives or seen problems. Good work -
>>> thanks!
>> 
>> Nice. We need to distinguish a little between what is a technical issue and 
>> what is a false-positive/missing domain on the list. However, testing both 
>> at the same time is something we will all cope quite well with :)
>> 
>> -Michael
>> 
>>> Best
>>> Matthias
>>> 
>>>> Over the past few weeks I have made significant progress on this all, and 
>>>> I think we're getting close to something the community will be really 
>>>> happy with. I'd love to get feedback from the team before we finalise 
>>>> things.
>>>> 
>>>> So what has happened?
>>>> 
>>>> First of all, the entire project has been renamed. DNSBL is not entirely 
>>>> what this is. Although the lists can be thrown into DNS, they have much 
>>>> more use outside of it that I thought we should simply go with DBL, short 
>>>> for Domain Blocklist. After all, we are only importing domains. The new 
>>>> home of the project therefore is https://www.ipfire.org/dbl
>>>> 
>>>> I have added a couple more lists that I thought interesting and I have 
>>>> added a couple more sources that I considered a good start. Hopefully, we 
>>>> will soon gather some more feedback on how well this is all holding up. My 
>>>> main focus has however been on the technology that will power this project.
>>>> 
>>>> One of the bigger challenges was to create Suricata rules from the lists. 
>>>> Initially I tried to create a ton of rules but since our lists are so 
>>>> large, this quickly became too complicated. I have now settled on using a 
>>>> feature that is only available in more recent versions of Suricata (I 
>>>> believe 7 and later), but since we are already on Suricata 8 in IPFire 
>>>> this won’t be a problem for us. All domains for each list are basically 
>>>> compiled into one massively large dataset and one single rule is referring 
>>>> to that dataset. This way, we won’t have the option to remove any 
>>>> false-positives, but at least Suricata and the GUI won’t starve a really 
>>>> bad death when loading millions of rules.
>>>> 
>>>> Suricata will now be able to use our rules to block access to any listed 
>>>> domains of each of the categories over DNS, HTTP, TLS or QUIC. Although I 
>>>> don’t expect many users to use Suricata to block porn or other things, 
>>>> this is a great backstop to enforce any policy like that. For example, if 
>>>> there is a user on the network who is trying to circumvent the DNS server 
>>>> that might filter out certain domains, even after getting an IP address 
>>>> resolved through other means, they won’t be able to open a TLS/QUIC 
>>>> connection or send a HTTP request to all blocked domains. Some people have 
>>>> said they were interested in blocking DNS-over-HTTPS and this is a perfect 
>>>> way to do this and actually be sure that any server that is being blocked 
>>>> on the list will actually be completely inaccessible.
>>>> 
>>>> Those Suricata rules are already available for testing in Core Update 200: 
>>>> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=9eb8751487d23dd354a105c28bdbbb0398fe6e85
>>>> 
>>>> I have chosen various severities for the lists. If someone was to block 
>>>> advertising using DBL, this is fine, but not a very severe alert. If 
>>>> someone chooses to block malware and there is a system on the network 
>>>> trying to access those domains, this is an alert worth being investigated 
>>>> by an admin. Our new Suricata Reporter will show those violations in 
>>>> different colours based on the severity which helps to identify the right 
>>>> alerts to further investigate.
>>>> 
>>>> Formerly I have asked you to test the lists using URL Filter. Those rules 
>>>> are now available as well in Core Update 200: 
>>>> https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=db160694279a4b10378447f775dd536fdfcfb02a
>>>> 
>>>> I talked about a method to remove any dead domains from any sources which 
>>>> is a great way to keep our lists smaller. The pure size of them is a 
>>>> problem in so many ways. That check was however a little bit too ambitious 
>>>> and I had to make it a little bit less eager. Basically if we are in 
>>>> doubt, we need to still list the domain because it might be resolvable by 
>>>> a user.
>>>> 
>>>> https://git.ipfire.org/?p=dbl.git;a=commitdiff;h=bb5b6e33b731501d45dea293505f7d42a61d5ce7
>>>> 
>>>> So how else could we make the lists smaller without losing any actual 
>>>> data? Since we sometimes list a whole TLD (e.g. .xxx or .porn), there is 
>>>> very little point in listing any domains of this TLD. They will always be 
>>>> caught anyways. So I built a check that marks all domains that don’t need 
>>>> to be included on the exported lists because they will never be needed and 
>>>> was able to shrink the size of the lists by a lot again.
>>>> 
>>>> The website does not show this data, but the API returns the number of 
>>>> “subsumed” domains (I didn’t have a better name):
>>>> 
>>>> curl https://api.dbl.ipfire.org/lists | jq .
>>>> 
>>>> The number shown would normally be added to the total number of domains 
>>>> and usually cuts the size of the list by 50-200%.
>>>> 
>>>> Those stats will now also be stored in a history table so that we will be 
>>>> able to track growth of all lists.
>>>> 
>>>> Furthermore, the application will now send email notifications for any 
>>>> incoming reports. This way, we will be able to stay in close touch with 
>>>> the reporters and keep them up to date on their submissions as well as 
>>>> inform moderators that there is something to have a look at.
>>>> 
>>>> The search has been refactored as well, so that we can show clearly 
>>>> whether something is blocked or not at one glance: 
>>>> https://www.ipfire.org/dbl/search?q=github.com. There is detailed 
>>>> information available on all domains and what happened to them. In case of 
>>>> GitHub.com, this seems to be blocked and unblocked by someone all of the 
>>>> time and we can see a clear audit trail of that: 
>>>> https://www.ipfire.org/dbl/lists/malware/domains/github.com
>>>> 
>>>> On the DNS front, I have added some metadata to the zones so that people 
>>>> can programmatically request some data, like when it has been last updated 
>>>> (in a human-friendly timestamp and not only the serial), license, 
>>>> description and so on:
>>>> 
>>>> # dig +short ANY _info.ads.dbl.ipfire.org @primary.dbl.ipfire.org
>>>> "total-domains=42226"
>>>> "license=CC BY-SA 4.0"
>>>> "updated-at=2026-01-20T22:17:02.409933+00:00"
>>>> "description=Blocks domains used for ads, tracking, and ad delivery”
>>>> 
>>>> Now, I would like to hear more feedback from you. I know we've all been 
>>>> stretched thin lately, so I especially appreciate anyone who has time to 
>>>> review and provide input. Ideas, just say if you like it or not. Where 
>>>> this could go in the future?
>>>> 
>>>> Looking ahead, I would like us to start thinking about the RPZ feature 
>>>> that has been on the wishlist. IPFire DBL has been a bigger piece of work, 
>>>> and I think it's worth having a conversation about sustainability. 
>>>> Resources for this need to be allocated and paid for. Open source is about 
>>>> freedom, not free beer — and to keep building features like this, we will 
>>>> need to explore some funding options. I would be interested to hear any 
>>>> ideas you might have that could work for IPFire.
>>>> 
>>>> Please share your thoughts on the mailing list when you can — even a quick 
>>>> 'looks good' or 'I have concerns about X' is valuable. Public discussion 
>>>> helps everyone stay in the loop and contribute.
>>>> 
>>>> I am aiming to move forward with this in a week's time, so if you have 
>>>> input, now would be a good time to share it.
>>>> 
>>>> Best,
>>>> -Michael
>>>> 
>>>>> On 6 Jan 2026, at 10:20, Michael Tremer <[email protected]> wrote:
>>>>> 
>>>>> Good Morning Adolf,
>>>>> 
>>>>> I had a look at this problem yesterday and it seems that parsing the 
>>>>> format is becoming a little bit difficult this way. Since this is only 
>>>>> affecting very few domains, I have simply whitelisted them all manually 
>>>>> and duckduckgo.com <http://duckduckgo.com/> and others should now be 
>>>>> easily reachable again.
>>>>> 
>>>>> Please let me know if you have any more findings.
>>>>> 
>>>>> All the best,
>>>>> -Michael
>>>>> 
>>>>>> On 5 Jan 2026, at 11:48, Michael Tremer <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hello Adolf,
>>>>>> 
>>>>>> This is a good find.
>>>>>> 
>>>>>> But if duckduckgo.com <http://duckduckgo.com/> is blocked, we will have 
>>>>>> to have a source somewhere that blocks that domain. Not only a 
>>>>>> sub-domain of it. Otherwise we have a bug somewhere.
>>>>>> 
>>>>>> This is most likely as the domain is listed here, but with some stuff 
>>>>>> afterwards:
>>>>>> 
>>>>>> https://raw.githubusercontent.com/mtxadmin/ublock/refs/heads/master/hosts/_malware_typo
>>>>>> 
>>>>>> We strip everything after a # away because we consider it a comment. 
>>>>>> However, that causes that there is only a line with the domain left 
>>>>>> which will cause it being listed.
>>>>>> 
>>>>>> The # sign is used as some special character but at the same time it is 
>>>>>> being used for comments.
>>>>>> 
>>>>>> I will fix this and then refresh the list.
>>>>>> 
>>>>>> -Michael
>>>>>> 
>>>>>>> On 5 Jan 2026, at 11:31, Adolf Belka <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Michael,
>>>>>>> 
>>>>>>> 
>>>>>>> On 05/01/2026 12:11, Adolf Belka wrote:
>>>>>>>> Hi Michael,
>>>>>>>> 
>>>>>>>> I have found that the malware list includes duckduckgo.com
>>>>>>>> 
>>>>>>> I have checked through the various sources used for the malware list.
>>>>>>> 
>>>>>>> The ShadowWhisperer (Tracking) list has improving.duckduckgo.com in its 
>>>>>>> list. I suspect that this one is the one causing the problem.
>>>>>>> 
>>>>>>> The mtxadmin (_malware_typo) list has duckduckgo.com mentioned 3 times 
>>>>>>> but not directly as a domain name - looks more like a reference.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Adolf.
>>>>>>> 
>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Adolf.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 02/01/2026 14:02, Adolf Belka wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> On 02/01/2026 12:09, Michael Tremer wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>>> On 30 Dec 2025, at 14:05, Adolf Belka <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Michael,
>>>>>>>>>>> 
>>>>>>>>>>> On 29/12/2025 13:05, Michael Tremer wrote:
>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>> 
>>>>>>>>>>>> I hope everyone had a great Christmas and a couple of quiet days 
>>>>>>>>>>>> to relax from all the stress that was the year 2025.
>>>>>>>>>>> Still relaxing.
>>>>>>>>>> 
>>>>>>>>>> Very good, so let’s have a strong start into 2026 now!
>>>>>>>>> 
>>>>>>>>> Starting next week, yes.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>>> Having a couple of quieter days, I have been working on a new, 
>>>>>>>>>>>> little (hopefully) side project that has probably been high up on 
>>>>>>>>>>>> our radar since the Shalla list has shut down in 2020, or maybe 
>>>>>>>>>>>> even earlier. The goal of the project is to provide good lists 
>>>>>>>>>>>> with categories of domain names which are usually used to block 
>>>>>>>>>>>> access to these domains.
>>>>>>>>>>>> 
>>>>>>>>>>>> I simply call this IPFire DNSBL which is short for IPFire DNS 
>>>>>>>>>>>> Blocklists.
>>>>>>>>>>>> 
>>>>>>>>>>>> How did we get here?
>>>>>>>>>>>> 
>>>>>>>>>>>> As stated before, the URL filter feature in IPFire has the problem 
>>>>>>>>>>>> that there are not many good blocklists available any more. There 
>>>>>>>>>>>> used to be a couple more - most famously the Shalla list - but we 
>>>>>>>>>>>> are now down to a single list from the University of Toulouse. It 
>>>>>>>>>>>> is a great list, but it is not always the best fit for all users.
>>>>>>>>>>>> 
>>>>>>>>>>>> Then there has been talk about whether we could implement more 
>>>>>>>>>>>> blocking features into IPFire that don’t involve the proxy. Most 
>>>>>>>>>>>> famously blocking over DNS. The problem here remains a the 
>>>>>>>>>>>> blocking feature is only as good as the data that is fed into it. 
>>>>>>>>>>>> Some people have been putting forward a number of lists that were 
>>>>>>>>>>>> suitable for them, but they would not have replaced the blocking 
>>>>>>>>>>>> functionality as we know it. Their aim is to provide “one list for 
>>>>>>>>>>>> everything” but that is not what people usually want. It is 
>>>>>>>>>>>> targeted at a classic home user and the only separation that is 
>>>>>>>>>>>> being made is any adult/porn/NSFW content which usually is put 
>>>>>>>>>>>> into a separate list.
>>>>>>>>>>>> 
>>>>>>>>>>>> It would have been technically possible to include these lists and 
>>>>>>>>>>>> let the users decide, but that is not the aim of IPFire. We want 
>>>>>>>>>>>> to do the job for the user so that their job is getting easier. 
>>>>>>>>>>>> Including obscure lists that don’t have a clear outline of what 
>>>>>>>>>>>> they actually want to block (“bad content” is not a category) and 
>>>>>>>>>>>> passing the burden of figuring out whether they need the “Light”, 
>>>>>>>>>>>> “Normal”, “Pro”, “Pro++”, “Ultimate” or even a “Venti” list with 
>>>>>>>>>>>> cream on top is really not going to work. It is all confusing and 
>>>>>>>>>>>> will lead to a bad user experience.
>>>>>>>>>>>> 
>>>>>>>>>>>> An even bigger problem that is however completely impossible to 
>>>>>>>>>>>> solve is bad licensing of these lists. A user has asked the 
>>>>>>>>>>>> publisher of the HaGeZi list whether they could be included in 
>>>>>>>>>>>> IPFire and under what terms. The response was that the list is 
>>>>>>>>>>>> available under the terms of the GNU General Public License v3, 
>>>>>>>>>>>> but that does not seem to be true. The list contains data from 
>>>>>>>>>>>> various sources. Many of them are licensed under incompatible 
>>>>>>>>>>>> licenses (CC BY-SA 4.0, MPL, Apache2, …) and unless there is a 
>>>>>>>>>>>> non-public agreement that this data may be redistributed, there is 
>>>>>>>>>>>> a huge legal issue here. We would expose our users to potential 
>>>>>>>>>>>> copyright infringement which we cannot do under any circumstances. 
>>>>>>>>>>>> Furthermore many lists are available under a non-commercial 
>>>>>>>>>>>> license which excludes them from being used in any kind of 
>>>>>>>>>>>> business. Plenty of IPFire systems are running in businesses, if 
>>>>>>>>>>>> not even the vast majority.
>>>>>>>>>>>> 
>>>>>>>>>>>> In short, these lists are completely unusable for us. Apart from 
>>>>>>>>>>>> HaGeZi, I consider OISD to have the same problem.
>>>>>>>>>>>> 
>>>>>>>>>>>> Enough about all the things that are bad. Let’s talk about the 
>>>>>>>>>>>> new, good things:
>>>>>>>>>>>> 
>>>>>>>>>>>> Many blacklists on the internet are an amalgamation of other 
>>>>>>>>>>>> lists. These lists vary in quality with some of them being not 
>>>>>>>>>>>> that good and without a clear focus and others being excellent 
>>>>>>>>>>>> data. Since we don’t have the man power to start from scratch, I 
>>>>>>>>>>>> felt that we can copy the concept that HaGeZi and OISD have 
>>>>>>>>>>>> started and simply create a new list that is based on other lists 
>>>>>>>>>>>> at the beginning to have a good starting point. That way, we have 
>>>>>>>>>>>> much better control over what is going on these lists and we can 
>>>>>>>>>>>> shape and mould them as we need them. Most importantly, we don’t 
>>>>>>>>>>>> create a single lists, but many lists that have a clear focus and 
>>>>>>>>>>>> allow users to choose what they want to block and what not.
>>>>>>>>>>>> 
>>>>>>>>>>>> So the current experimental stage that I am in has these lists:
>>>>>>>>>>>> 
>>>>>>>>>>>> * Ads
>>>>>>>>>>>> * Dating
>>>>>>>>>>>> * DoH
>>>>>>>>>>>> * Gambling
>>>>>>>>>>>> * Malware
>>>>>>>>>>>> * Porn
>>>>>>>>>>>> * Social
>>>>>>>>>>>> * Violence
>>>>>>>>>>>> 
>>>>>>>>>>>> The categories have been determined by what source lists we have 
>>>>>>>>>>>> available with good data and are compatible with our chosen 
>>>>>>>>>>>> license CC BY-SA 4.0. This is the same license that we are using 
>>>>>>>>>>>> for the IPFire Location database, too.
>>>>>>>>>>>> 
>>>>>>>>>>>> The main use-cases for any kind of blocking are to comply with 
>>>>>>>>>>>> legal requirements in networks with children (i.e. schools) to 
>>>>>>>>>>>> remove any kind of pornographic content, sometimes block social 
>>>>>>>>>>>> media as well. Gambling and violence are commonly blocked, too. 
>>>>>>>>>>>> Even more common would be filtering advertising and any malicious 
>>>>>>>>>>>> content.
>>>>>>>>>>>> 
>>>>>>>>>>>> The latter is especially difficult because so many source lists 
>>>>>>>>>>>> throw phishing, spyware, malvertising, tracking and other things 
>>>>>>>>>>>> into the same bucket. Here this is currently all in the malware 
>>>>>>>>>>>> list which has therefore become quite large. I am not sure whether 
>>>>>>>>>>>> this will stay like this in the future or if we will have to make 
>>>>>>>>>>>> some adjustments, but that is exactly why this is now entering 
>>>>>>>>>>>> some larger testing.
>>>>>>>>>>>> 
>>>>>>>>>>>> What has been built so far? In order to put these lists together 
>>>>>>>>>>>> properly, track any data about where it is coming from, I have 
>>>>>>>>>>>> built a tool in Python available here:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://git.ipfire.org/?p=dnsbl.git;a=summary
>>>>>>>>>>>> 
>>>>>>>>>>>> This tool will automatically update all lists once an hour if 
>>>>>>>>>>>> there have been any changes and export them in various formats. 
>>>>>>>>>>>> The exported lists are available for download here:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://dnsbl.ipfire.org/lists/
>>>>>>>>>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the 
>>>>>>>>>>> custom url works fine.
>>>>>>>>>>> 
>>>>>>>>>>> However you need to remember not to put the https:// at the front 
>>>>>>>>>>> of the url otherwise the WUI page completes without any error 
>>>>>>>>>>> messages but leaves an error message in the system logs saying
>>>>>>>>>>> 
>>>>>>>>>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist
>>>>>>>>>>> 
>>>>>>>>>>> I found this out the hard way.
>>>>>>>>>> 
>>>>>>>>>> Oh yes, I forgot that there is a field on the web UI. If that does 
>>>>>>>>>> not accept https:// as a prefix, please file a bug and we will fix 
>>>>>>>>>> it.
>>>>>>>>> 
>>>>>>>>> I will confirm it and raise a bug.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> The other thing I noticed is that if you already have the Toulouse 
>>>>>>>>>>> University list downloaded and you then change to the ipfire custom 
>>>>>>>>>>> url then all the existing Toulouse blocklists stay in the directory 
>>>>>>>>>>> on IPFire and so you end up with a huge number of category tick 
>>>>>>>>>>> boxes, most of which are the old Toulouse ones, which are still 
>>>>>>>>>>> available to select and it is not clear which ones are from 
>>>>>>>>>>> Toulouse and which ones from IPFire.
>>>>>>>>>> 
>>>>>>>>>> Yes, I got the same thing, too. I think this is a bug, too, because 
>>>>>>>>>> otherwise you would have a lot of unused categories lying around 
>>>>>>>>>> that will never be updated. You cannot even tell which ones are from 
>>>>>>>>>> the current list and which ones from the old list.
>>>>>>>>>> 
>>>>>>>>>> Long-term we could even consider to remove the Univ. Toulouse list 
>>>>>>>>>> entirely and only have our own lists available which would make the 
>>>>>>>>>> problem go away.
>>>>>>>>>> 
>>>>>>>>>>> I think if the blocklist URL source is changed or a custom url is 
>>>>>>>>>>> provided the first step should be to remove the old ones already 
>>>>>>>>>>> existing.
>>>>>>>>>>> That might be a problem because users can also create their own 
>>>>>>>>>>> blocklists and I believe those go into the same directory.
>>>>>>>>>> 
>>>>>>>>>> Good thought. We of course cannot delete the custom lists.
>>>>>>>>>> 
>>>>>>>>>>> Without clearing out the old blocklists you end up with a huge 
>>>>>>>>>>> number of checkboxes for lists but it is not clear what happens if 
>>>>>>>>>>> there is a category that has the same name for the Toulouse list 
>>>>>>>>>>> and the IPFire list such as gambling. I will have a look at that 
>>>>>>>>>>> and see what happens.
>>>>>>>>>>> 
>>>>>>>>>>> Not sure what the best approach to this is.
>>>>>>>>>> 
>>>>>>>>>> I believe it is removing all old content.
>>>>>>>>>> 
>>>>>>>>>>> Manually deleting all contents of the urlfilter/blacklists/ 
>>>>>>>>>>> directory and then selecting the IPFire blocklist url for the 
>>>>>>>>>>> custom url I end up with only the 8 categories from the IPFire list.
>>>>>>>>>>> 
>>>>>>>>>>> I have tested some gambling sites from the IPFire list and the 
>>>>>>>>>>> block worked on some. On others the site no longer exists so there 
>>>>>>>>>>> is nothing to block or has been changed to an https site and in 
>>>>>>>>>>> that case it went straight through. Also if I chose the http 
>>>>>>>>>>> version of the link, it was automatically changed to https and went 
>>>>>>>>>>> through without being blocked.
>>>>>>>>>> 
>>>>>>>>>> The entire IPFire infrastructure always requires HTTPS. If you start 
>>>>>>>>>> using HTTP, you will be automatically redirected. It is 2026 and we 
>>>>>>>>>> don’t need to talk HTTP any more :)
>>>>>>>>> 
>>>>>>>>> Some of the domains in the gambling list (maybe quite a lot) seem to 
>>>>>>>>> only have an http access. If I tried https it came back with the fact 
>>>>>>>>> that it couldn't find it.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I am glad to hear that the list is actually blocking. It would have 
>>>>>>>>>> been bad if it didn’t. Now we have the big task to check out the 
>>>>>>>>>> “quality” - however that can be determined. I think this is what 
>>>>>>>>>> needs some time…
>>>>>>>>>> 
>>>>>>>>>> In the meantime I have set up a small page on our website:
>>>>>>>>>> 
>>>>>>>>>> https://www.ipfire.org/dnsbl
>>>>>>>>>> 
>>>>>>>>>> I would like to run this as a first-class project inside IPFire like 
>>>>>>>>>> we are doing with IPFire Location. That means that we need to tell 
>>>>>>>>>> people about what we are doing. Hopefully this page is a little 
>>>>>>>>>> start.
>>>>>>>>>> 
>>>>>>>>>> Initially it has a couple of high-level bullet points about what we 
>>>>>>>>>> are trying to achieve. I don’t think the text is very good, yet, but 
>>>>>>>>>> it is the best I had in that moment. There is then also a list of 
>>>>>>>>>> the lists that we currently offer. For each list, a detailed page 
>>>>>>>>>> will tell you about the license, how many domains are listed, when 
>>>>>>>>>> the last update has been, the sources and even there is a history 
>>>>>>>>>> page that shows all the changes whenever they have happened.
>>>>>>>>>> 
>>>>>>>>>> Finally there is a section that explains “How To Use?” the list 
>>>>>>>>>> which I would love to extend to include AdGuard Plus and things like 
>>>>>>>>>> that as well as Pi-Hole and whatever else could use the list. In a 
>>>>>>>>>> later step we should go ahead and talk to any projects to include 
>>>>>>>>>> our list(s) into their dropdown so that people can enable them nice 
>>>>>>>>>> and easy.
>>>>>>>>>> 
>>>>>>>>>> Behind the web page there is an API service that is running on the 
>>>>>>>>>> host that is running the DNSBL. The frontend web app that is running 
>>>>>>>>>> www.ipfire.org <http://www.ipfire.org/> is connecting to that API 
>>>>>>>>>> service to fetch the current lists, any details and so on. That way, 
>>>>>>>>>> we can split the logic and avoid creating a huge monolith of a web 
>>>>>>>>>> app. This also means that page could be down a little as I am still 
>>>>>>>>>> working on the entire thing and will frequently restart it.
>>>>>>>>>> 
>>>>>>>>>> The API documentation is available here and the API is publicly 
>>>>>>>>>> available: https://api.dnsbl.ipfire.org/docs
>>>>>>>>>> 
>>>>>>>>>> The website/API allows to file reports for anything that does not 
>>>>>>>>>> seem to be right on any of the lists. I would like to keep it as an 
>>>>>>>>>> open process, however, long-term, this cannot cost us any time. In 
>>>>>>>>>> the current stage, the reports are getting filed and that is about 
>>>>>>>>>> it. I still need to build out some way for admins or moderators (I 
>>>>>>>>>> am not sure what kind of roles I want to have here) to accept or 
>>>>>>>>>> reject those reports.
>>>>>>>>>> 
>>>>>>>>>> In case of us receiving a domain from a source list, I would rather 
>>>>>>>>>> like to submit a report to upstream for them to de-list. That way, 
>>>>>>>>>> we don’t have any admin to do and we are contributing back to other 
>>>>>>>>>> list. That would be a very good thing to do. We cannot however throw 
>>>>>>>>>> tons of emails at some random upstream projects without 
>>>>>>>>>> co-ordinating this first. By not reporting upstream, we will 
>>>>>>>>>> probably over time create large whitelists and I am not sure if that 
>>>>>>>>>> is a good thing to do.
>>>>>>>>>> 
>>>>>>>>>> Finally, there is a search box that can be used to find out if a 
>>>>>>>>>> domain is listed on any of the lists.
>>>>>>>>>> 
>>>>>>>>>>>> If you download and open any of the files, you will see a large 
>>>>>>>>>>>> header that includes copyright information and lists all sources 
>>>>>>>>>>>> that have been used to create the individual lists. This way we 
>>>>>>>>>>>> ensure maximum transparency, comply with the terms of the 
>>>>>>>>>>>> individual licenses of the source lists and give credit to the 
>>>>>>>>>>>> people who help us to put together the most perfect list for our 
>>>>>>>>>>>> users.
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like this to become a project that is not only being used 
>>>>>>>>>>>> in IPFire. We can and will be compatible with other solutions like 
>>>>>>>>>>>> AdGuard, PiHole so that people can use our lists if they would 
>>>>>>>>>>>> like to even though they are not using IPFire. Hopefully, these 
>>>>>>>>>>>> users will also feed back to us so that we can improve our lists 
>>>>>>>>>>>> over time and make them one of the best options out there.
>>>>>>>>>>>> 
>>>>>>>>>>>> All lists are available as a simple text file that lists the 
>>>>>>>>>>>> domains. Then there is a hosts file available as well as a DNS 
>>>>>>>>>>>> zone file and an RPZ file. Each list is individually available to 
>>>>>>>>>>>> be used in squidGuard and there is a larger tarball available with 
>>>>>>>>>>>> all lists that can be used in IPFire’s URL Filter. I am planning 
>>>>>>>>>>>> to add Suricata/Snort signatures whenever I have time to do so. 
>>>>>>>>>>>> Even though it is not a good idea to filter pornographic content 
>>>>>>>>>>>> this way, I suppose that catching malware and blocking DoH are 
>>>>>>>>>>>> good use-cases for an IPS. Time will tell…
>>>>>>>>>>>> 
>>>>>>>>>>>> As a start, we will make these lists available in IPFire’s URL 
>>>>>>>>>>>> Filter and collect some feedback about how we are doing. 
>>>>>>>>>>>> Afterwards, we can see where else we can take this project.
>>>>>>>>>>>> 
>>>>>>>>>>>> If you want to enable this on your system, simply add the URL to 
>>>>>>>>>>>> your autoupdate.urls file like here:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f
>>>>>>>>>>> I also tested out adding the IPFire url to autoupdate.urls and that 
>>>>>>>>>>> also worked fine for me.
>>>>>>>>>> 
>>>>>>>>>> Very good. Should we include this already with Core Update 200? I 
>>>>>>>>>> don’t think we would break anything, but we might already gain a 
>>>>>>>>>> couple more people who are helping us to test this all?
>>>>>>>>> 
>>>>>>>>> I think that would be a good idea.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The next step would be to build and test our DNS infrastructure. In 
>>>>>>>>>> the “How To Use?” Section on the pages of the individual lists, you 
>>>>>>>>>> can already see some instructions on how to use the lists as an RPZ. 
>>>>>>>>>> In comparison to other “providers”, I would prefer if people would 
>>>>>>>>>> be using DNS to fetch the lists. This is simply to push out updates 
>>>>>>>>>> in a cheap way for us and also do it very regularly.
>>>>>>>>>> 
>>>>>>>>>> Initially, clients will pull the entire list using AXFR. There is no 
>>>>>>>>>> way around this as they need to have the data in the first place. 
>>>>>>>>>> After that, clients will only need the changes. As you can see in 
>>>>>>>>>> the history, the lists don’t actually change that often. Sometimes 
>>>>>>>>>> only once a day and therefore downloading the entire list again 
>>>>>>>>>> would be a huge waste of data, both on the client side, but also for 
>>>>>>>>>> us hosting then.
>>>>>>>>>> 
>>>>>>>>>> Some other providers update their lists “every 10 minutes”, and 
>>>>>>>>>> there won't be any changes whatsoever. We don’t do that. We will 
>>>>>>>>>> only export the lists again when they have actually changed. The 
>>>>>>>>>> timestamps on the files that we offer using HTTPS can be checked by 
>>>>>>>>>> clients so that they won’t re-download the list again if it has not 
>>>>>>>>>> been changed. But using HTTPS still means that we would have to 
>>>>>>>>>> re-download the entire list and not only the changes.
>>>>>>>>>> 
>>>>>>>>>> Using DNS and IXFR will update the lists by only transferring a few 
>>>>>>>>>> kilobytes and therefore we can have clients check once an hour if a 
>>>>>>>>>> list has actually changed and only send out the raw changes. That 
>>>>>>>>>> way, we will be able to serve millions of clients at very cheap cost 
>>>>>>>>>> and they will always have a very up to date list.
>>>>>>>>>> 
>>>>>>>>>> As far as I can see any DNS software that supports RPZs supports 
>>>>>>>>>> AXFR/IXFR with exception of Knot Resolver which expects the zone to 
>>>>>>>>>> be downloaded externally. There is a ticket for AXFR/IXFR support 
>>>>>>>>>> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).
>>>>>>>>>> 
>>>>>>>>>> Initially, some of the lists have been *huge* which is why a simple 
>>>>>>>>>> HTTP download is not feasible. The porn list was over 100 MiB. We 
>>>>>>>>>> could have spent thousands on just traffic alone which I don’t have 
>>>>>>>>>> for this kind of project. It would also be unnecessary money being 
>>>>>>>>>> spent. There are simply better solutions out there. But then I built 
>>>>>>>>>> something that basically tests the data that we are receiving from 
>>>>>>>>>> upstream but simply checking if a listed domain still exists. The 
>>>>>>>>>> result was very astonishing to me.
>>>>>>>>>> 
>>>>>>>>>> So whenever someone adds a domain to the list, we will (eventually, 
>>>>>>>>>> but not immediately) check if we can resolve the domain’s SOA 
>>>>>>>>>> record. If not, we mark the domain as non-active and will no longer 
>>>>>>>>>> include them in the exported data. This brought down the porn list 
>>>>>>>>>> from just under 5 million domains to just 421k. On the sources page 
>>>>>>>>>> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the 
>>>>>>>>>> percentage of dead domains from each of them and the UT1 list has 
>>>>>>>>>> 94% dead domains. Wow.
>>>>>>>>>> 
>>>>>>>>>> If we cannot resolve the domain, neither can our users. So we would 
>>>>>>>>>> otherwise fill the lists with tons of domains that simply could 
>>>>>>>>>> never be reached. And if they cannot be reached, why would we block 
>>>>>>>>>> them? We would waste bandwidth and a lot of memory on each single 
>>>>>>>>>> client.
>>>>>>>>>> 
>>>>>>>>>> The other sources have similarly high rations of dead domains. Most 
>>>>>>>>>> of them are in the 50-80% range. Therefore I am happy that we are 
>>>>>>>>>> doing some extra work here to give our users much better data for 
>>>>>>>>>> their filtering.
>>>>>>>>> 
>>>>>>>>> Removing all dead entries sounds like an excellent step.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> Adolf.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So, if you like, please go and check out the RPZ blocking with 
>>>>>>>>>> Unbound. Instructions are on the page. I would be happy to hear how 
>>>>>>>>>> this is turning out.
>>>>>>>>>> 
>>>>>>>>>> Please let me know if there are any more questions, and I would be 
>>>>>>>>>> glad to answer them.
>>>>>>>>>> 
>>>>>>>>>> Happy New Year,
>>>>>>>>>> -Michael
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Adolf.
>>>>>>>>>>>> This email is just a brain dump from me to this list. I would be 
>>>>>>>>>>>> happy to answer any questions about implementation details, etc. 
>>>>>>>>>>>> if people are interested. Right now, this email is long enough 
>>>>>>>>>>>> already…
>>>>>>>>>>>> 
>>>>>>>>>>>> All the best,
>>>>>>>>>>>> -Michael
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Sent from my laptop
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Sent from my laptop
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 
>

Re: Let's launch our own blocklists...

Reply via email to