Re: Let's launch our own blocklists...

Adolf Belka Fri, 02 Jan 2026 05:03:29 -0800

Hi,

On 02/01/2026 12:09, Michael Tremer wrote:

Hello,

On 30 Dec 2025, at 14:05, Adolf Belka <[email protected]> wrote:

Hi Michael,

On 29/12/2025 13:05, Michael Tremer wrote:

Hello everyone,

I hope everyone had a great Christmas and a couple of quiet days to relax from 
all the stress that was the year 2025.

Still relaxing.


Very good, so let’s have a strong start into 2026 now!


Starting next week, yes.

Having a couple of quieter days, I have been working on a new, little
(hopefully) side project that has probably been high up on our radar since the
Shalla list has shut down in 2020, or maybe even earlier. The goal of the
project is to provide good lists with categories of domain names which are
usually used to block access to these domains.

I simply call this IPFire DNSBL which is short for IPFire DNS Blocklists.

How did we get here?

As stated before, the URL filter feature in IPFire has the problem that there
are not many good blocklists available any more. There used to be a couple more
- most famously the Shalla list - but we are now down to a single list from the
University of Toulouse. It is a great list, but it is not always the best fit
for all users.

Then there has been talk about whether we could implement more blocking
features into IPFire that don’t involve the proxy. Most famously blocking over
DNS. The problem here remains a the blocking feature is only as good as the
data that is fed into it. Some people have been putting forward a number of
lists that were suitable for them, but they would not have replaced the
blocking functionality as we know it. Their aim is to provide “one list for
everything” but that is not what people usually want. It is targeted at a
classic home user and the only separation that is being made is any
adult/porn/NSFW content which usually is put into a separate list.

It would have been technically possible to include these lists and let the
users decide, but that is not the aim of IPFire. We want to do the job for the
user so that their job is getting easier. Including obscure lists that don’t
have a clear outline of what they actually want to block (“bad content” is not
a category) and passing the burden of figuring out whether they need the
“Light”, “Normal”, “Pro”, “Pro++”, “Ultimate” or even a “Venti” list with cream
on top is really not going to work. It is all confusing and will lead to a bad
user experience.

An even bigger problem that is however completely impossible to solve is bad
licensing of these lists. A user has asked the publisher of the HaGeZi list
whether they could be included in IPFire and under what terms. The response was
that the list is available under the terms of the GNU General Public License
v3, but that does not seem to be true. The list contains data from various
sources. Many of them are licensed under incompatible licenses (CC BY-SA 4.0,
MPL, Apache2, …) and unless there is a non-public agreement that this data may
be redistributed, there is a huge legal issue here. We would expose our users
to potential copyright infringement which we cannot do under any circumstances.
Furthermore many lists are available under a non-commercial license which
excludes them from being used in any kind of business. Plenty of IPFire systems
are running in businesses, if not even the vast majority.

In short, these lists are completely unusable for us. Apart from HaGeZi, I
consider OISD to have the same problem.

Enough about all the things that are bad. Let’s talk about the new, good things:

Many blacklists on the internet are an amalgamation of other lists. These lists
vary in quality with some of them being not that good and without a clear focus
and others being excellent data. Since we don’t have the man power to start
from scratch, I felt that we can copy the concept that HaGeZi and OISD have
started and simply create a new list that is based on other lists at the
beginning to have a good starting point. That way, we have much better control
over what is going on these lists and we can shape and mould them as we need
them. Most importantly, we don’t create a single lists, but many lists that
have a clear focus and allow users to choose what they want to block and what
not.

So the current experimental stage that I am in has these lists:

* Ads
* Dating
* DoH
* Gambling
* Malware
* Porn
* Social
* Violence

The categories have been determined by what source lists we have available with
good data and are compatible with our chosen license CC BY-SA 4.0. This is the
same license that we are using for the IPFire Location database, too.

The main use-cases for any kind of blocking are to comply with legal
requirements in networks with children (i.e. schools) to remove any kind of
pornographic content, sometimes block social media as well. Gambling and
violence are commonly blocked, too. Even more common would be filtering
advertising and any malicious content.

The latter is especially difficult because so many source lists throw phishing,
spyware, malvertising, tracking and other things into the same bucket. Here
this is currently all in the malware list which has therefore become quite
large. I am not sure whether this will stay like this in the future or if we
will have to make some adjustments, but that is exactly why this is now
entering some larger testing.

What has been built so far? In order to put these lists together properly,
track any data about where it is coming from, I have built a tool in Python
available here:

https://git.ipfire.org/?p=dnsbl.git;a=summary

This tool will automatically update all lists once an hour if there have been
any changes and export them in various formats. The exported lists are
available for download here:

https://dnsbl.ipfire.org/lists/

The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the custom url 
works fine.

However you need to remember not to put the https:// at the front of the url 
otherwise the WUI page completes without any error messages but leaves an error 
message in the system logs saying

URL filter blacklist - ERROR: Not a valid URL filter blacklist

I found this out the hard way.


Oh yes, I forgot that there is a field on the web UI. If that does not accept 
https:// as a prefix, please file a bug and we will fix it.


I will confirm it and raise a bug.

The other thing I noticed is that if you already have the Toulouse University 
list downloaded and you then change to the ipfire custom url then all the 
existing Toulouse blocklists stay in the directory on IPFire and so you end up 
with a huge number of category tick boxes, most of which are the old Toulouse 
ones, which are still available to select and it is not clear which ones are 
from Toulouse and which ones from IPFire.


Yes, I got the same thing, too. I think this is a bug, too, because otherwise 
you would have a lot of unused categories lying around that will never be 
updated. You cannot even tell which ones are from the current list and which 
ones from the old list.

Long-term we could even consider to remove the Univ. Toulouse list entirely and 
only have our own lists available which would make the problem go away.

I think if the blocklist URL source is changed or a custom url is provided the 
first step should be to remove the old ones already existing.
That might be a problem because users can also create their own blocklists and 
I believe those go into the same directory.


Good thought. We of course cannot delete the custom lists.

Without clearing out the old blocklists you end up with a huge number of 
checkboxes for lists but it is not clear what happens if there is a category 
that has the same name for the Toulouse list and the IPFire list such as 
gambling. I will have a look at that and see what happens.

Not sure what the best approach to this is.


I believe it is removing all old content.

Manually deleting all contents of the urlfilter/blacklists/ directory and then 
selecting the IPFire blocklist url for the custom url I end up with only the 8 
categories from the IPFire list.

I have tested some gambling sites from the IPFire list and the block worked on 
some. On others the site no longer exists so there is nothing to block or has 
been changed to an https site and in that case it went straight through. Also 
if I chose the http version of the link, it was automatically changed to https 
and went through without being blocked.


The entire IPFire infrastructure always requires HTTPS. If you start using 
HTTP, you will be automatically redirected. It is 2026 and we don’t need to 
talk HTTP any more :)

Some of the domains in the gambling list (maybe quite a lot) seem toonly have an http access. If I tried https it came back with the factthat it couldn't find it.


I am glad to hear that the list is actually blocking. It would have been bad if 
it didn’t. Now we have the big task to check out the “quality” - however that 
can be determined. I think this is what needs some time…

In the meantime I have set up a small page on our website:

   https://www.ipfire.org/dnsbl

I would like to run this as a first-class project inside IPFire like we are 
doing with IPFire Location. That means that we need to tell people about what 
we are doing. Hopefully this page is a little start.

Initially it has a couple of high-level bullet points about what we are trying 
to achieve. I don’t think the text is very good, yet, but it is the best I had 
in that moment. There is then also a list of the lists that we currently offer. 
For each list, a detailed page will tell you about the license, how many 
domains are listed, when the last update has been, the sources and even there 
is a history page that shows all the changes whenever they have happened.

Finally there is a section that explains “How To Use?” the list which I would 
love to extend to include AdGuard Plus and things like that as well as Pi-Hole 
and whatever else could use the list. In a later step we should go ahead and 
talk to any projects to include our list(s) into their dropdown so that people 
can enable them nice and easy.

Behind the web page there is an API service that is running on the host that is 
running the DNSBL. The frontend web app that is running www.ipfire.org 
<http://www.ipfire.org/> is connecting to that API service to fetch the current 
lists, any details and so on. That way, we can split the logic and avoid creating a 
huge monolith of a web app. This also means that page could be down a little as I am 
still working on the entire thing and will frequently restart it.

The API documentation is available here and the API is publicly available: 
https://api.dnsbl.ipfire.org/docs

The website/API allows to file reports for anything that does not seem to be 
right on any of the lists. I would like to keep it as an open process, however, 
long-term, this cannot cost us any time. In the current stage, the reports are 
getting filed and that is about it. I still need to build out some way for 
admins or moderators (I am not sure what kind of roles I want to have here) to 
accept or reject those reports.

In case of us receiving a domain from a source list, I would rather like to 
submit a report to upstream for them to de-list. That way, we don’t have any 
admin to do and we are contributing back to other list. That would be a very 
good thing to do. We cannot however throw tons of emails at some random 
upstream projects without co-ordinating this first. By not reporting upstream, 
we will probably over time create large whitelists and I am not sure if that is 
a good thing to do.

Finally, there is a search box that can be used to find out if a domain is 
listed on any of the lists.

If you download and open any of the files, you will see a large header that
includes copyright information and lists all sources that have been used to
create the individual lists. This way we ensure maximum transparency, comply
with the terms of the individual licenses of the source lists and give credit
to the people who help us to put together the most perfect list for our users.

I would like this to become a project that is not only being used in IPFire. We
can and will be compatible with other solutions like AdGuard, PiHole so that
people can use our lists if they would like to even though they are not using
IPFire. Hopefully, these users will also feed back to us so that we can improve
our lists over time and make them one of the best options out there.

All lists are available as a simple text file that lists the domains. Then
there is a hosts file available as well as a DNS zone file and an RPZ file.
Each list is individually available to be used in squidGuard and there is a
larger tarball available with all lists that can be used in IPFire’s URL
Filter. I am planning to add Suricata/Snort signatures whenever I have time to
do so. Even though it is not a good idea to filter pornographic content this
way, I suppose that catching malware and blocking DoH are good use-cases for an
IPS. Time will tell…

As a start, we will make these lists available in IPFire’s URL Filter and
collect some feedback about how we are doing. Afterwards, we can see where else
we can take this project.

If you want to enable this on your system, simply add the URL to your
autoupdate.urls file like here:

https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f

I also tested out adding the IPFire url to autoupdate.urls and that also worked 
fine for me.


Very good. Should we include this already with Core Update 200? I don’t think 
we would break anything, but we might already gain a couple more people who are 
helping us to test this all?


I think that would be a good idea.


The next step would be to build and test our DNS infrastructure. In the “How To 
Use?” Section on the pages of the individual lists, you can already see some 
instructions on how to use the lists as an RPZ. In comparison to other 
“providers”, I would prefer if people would be using DNS to fetch the lists. 
This is simply to push out updates in a cheap way for us and also do it very 
regularly.

Initially, clients will pull the entire list using AXFR. There is no way around 
this as they need to have the data in the first place. After that, clients will 
only need the changes. As you can see in the history, the lists don’t actually 
change that often. Sometimes only once a day and therefore downloading the 
entire list again would be a huge waste of data, both on the client side, but 
also for us hosting then.

Some other providers update their lists “every 10 minutes”, and there won't be 
any changes whatsoever. We don’t do that. We will only export the lists again 
when they have actually changed. The timestamps on the files that we offer 
using HTTPS can be checked by clients so that they won’t re-download the list 
again if it has not been changed. But using HTTPS still means that we would 
have to re-download the entire list and not only the changes.

Using DNS and IXFR will update the lists by only transferring a few kilobytes 
and therefore we can have clients check once an hour if a list has actually 
changed and only send out the raw changes. That way, we will be able to serve 
millions of clients at very cheap cost and they will always have a very up to 
date list.

As far as I can see any DNS software that supports RPZs supports AXFR/IXFR with 
exception of Knot Resolver which expects the zone to be downloaded externally. 
There is a ticket for AXFR/IXFR support 
(https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).

Initially, some of the lists have been *huge* which is why a simple HTTP 
download is not feasible. The porn list was over 100 MiB. We could have spent 
thousands on just traffic alone which I don’t have for this kind of project. It 
would also be unnecessary money being spent. There are simply better solutions 
out there. But then I built something that basically tests the data that we are 
receiving from upstream but simply checking if a listed domain still exists. 
The result was very astonishing to me.

So whenever someone adds a domain to the list, we will (eventually, but not 
immediately) check if we can resolve the domain’s SOA record. If not, we mark 
the domain as non-active and will no longer include them in the exported data. 
This brought down the porn list from just under 5 million domains to just 421k. 
On the sources page (https://www.ipfire.org/dnsbl/lists/porn/sources) I am 
listing the percentage of dead domains from each of them and the UT1 list has 
94% dead domains. Wow.

If we cannot resolve the domain, neither can our users. So we would otherwise 
fill the lists with tons of domains that simply could never be reached. And if 
they cannot be reached, why would we block them? We would waste bandwidth and a 
lot of memory on each single client.

The other sources have similarly high rations of dead domains. Most of them are 
in the 50-80% range. Therefore I am happy that we are doing some extra work 
here to give our users much better data for their filtering.


Removing all dead entries sounds like an excellent step.

Regards,

Adolf.


So, if you like, please go and check out the RPZ blocking with Unbound. 
Instructions are on the page. I would be happy to hear how this is turning out.

Please let me know if there are any more questions, and I would be glad to 
answer them.

Happy New Year,
-Michael


Regards,
Adolf.

This email is just a brain dump from me to this list. I would be happy to 
answer any questions about implementation details, etc. if people are 
interested. Right now, this email is long enough already…

All the best,
-Michael


--
Sent from my laptop


--
Sent from my laptop

Re: Let's launch our own blocklists...

Reply via email to