Of course this script is sluggish since it reads many category files and forks 
at least 3-6 times.

If you *really* want to implement this with a perl script, it should read all 
files at startup and the script does a lookup using perl data structures.

But I suggest to look at ufdbGuard which is a URL filter that is way faster and 
has all functionality that you need.

Marcus


On 2020-10-02 10:08, Vieri wrote:
Regarding the use of an external ACL I quickly implemented a perl script that "does 
the job", but it seems to be somewhat sluggish.

This is how it's configured in squid.conf:
external_acl_type bllookup ttl=86400 negative_ttl=86400 children-max=80 
children-startup=10 children-idle=3 concurrency=8 %PROTO %DST %PORT %PATH 
/opt/custom/scripts/squid/ext_txt_blwl_acl.pl 
--categories=adv,aggressive,alcohol,anonvpn,automobile_bikes,automobile_boats,automobile_cars,automobile_planes,chat,costtraps,dating,drugs,dynamic,finance_insurance,finance_moneylending,finance_other,finance_realestate,finance_trading,fortunetelling,forum,gamble,hacking,hobby_cooking,hobby_games-misc,hobby_games-online,hobby_gardening,hobby_pets,homestyle,ibs,imagehosting,isp,jobsearch,military,models,movies,music,podcasts,politics,porn,radiotv,recreation_humor,recreation_martialarts,recreation_restaurants,recreation_sports,recreation_travel,recreation_wellness,redirector,religion,remotecontrol,ringtones,science_astronomy,science_chemistry,sex_education,sex_lingerie,shopping,socialnet,spyware,tracker,updatesites,urlshortener,violence,warez,weapons,webphone,webradio,webtv

I'd like to avoid the use of a DB if possible, but maybe someone here has an 
idea to share on flat file text searches.

Currently the dir structure of my blacklists is:

topdir
category1 ... categoryN
domains urls

So basically one example file to search in is topdir/category8/urls, etc.

The helper perl script contains this code to decide whether to block access or 
not:

foreach( @categories )
{
         chomp($s_urls = qx{grep -nwx '$uri_dst$uri_path' $cats_where/$_/urls | 
head -n 1 | cut -f1 -d:});

         if (length($s_urls) > 0) {
             if ($whitelist == 0) {
                 $status = $cid." ERR message=\"URL ".$uri_dst." in BL ".$_." (line 
".$s_urls.")\"";
             } else {
                 $status = $cid." ERR message=\"URL ".$uri_dst." not in WL ".$_." (line 
".$s_urls.")\"";
             }
             next;
         }

         chomp($s_urls = qx{grep -nwx '$uri_dst' $cats_where/$_/domains | head 
-n 1 | cut -f1 -d:});

         if (length($s_urls) > 0) {
             if ($whitelist == 0) {
                 $status = $cid." ERR message=\"Domain ".$uri_dst." in BL ".$_." (line 
".$s_urls.")\"";
             } else {
                 $status = $cid." ERR message=\"Domain ".$uri_dst." not in WL ".$_." (line 
".$s_urls.")\"";
             }
             next;
         }
}

There are currently 66 "categories" with around 50MB of text data in all.
So that's a lot to go through each time there's an HTTP request.
Apart from placing these blacklists on a ramdisk (currently on an M.2 SSD disk 
so I'm not sure I'll notice anything) what else can I try?
Should I reindex the lists and group them all alphabetically?
For instance should I process the lists in order to generate a dir structure as 
follows?

topdir
a b c d e f ... x y z 0 1 2 3 ... 7 8 9
domains urls

An example for a client requesting https://www.google.com/ would lead to 
searching only 2 files:
topdir/w/domains
topdir/w/urls

An example for a client requesting https://01.whatever.com/x would also lead to 
searching only 2 files:
topdir/0/domains
topdir/0/urls

An example for a client requesting https://8.8.8.8/xyz would also lead to 
searching only 2 files:
topdir/8/domains
topdir/8/urls

Any ideas or links to scripts that already prepare lists for this?

Thanks,

Vieri
_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

Reply via email to