Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-13 Thread Amos Jeffries
On 13/04/2014 7:08 a.m., Nick Hill wrote:
 I have been ironing out issues with my windows updates set-up for
 Squid. I have been through my squid.conf file to de-cruft it.
 
 The following squid.conf should be self-documenting. I have found this
 works well in a multi-computer environment where you can expect a lot
 of Windows machines to perform updates. A computer shop is a good
 example. Of course, you will want to configure a DHCP server with a
 wpad.dat address so that your client machines will auto-configure to
 use your proxy.
 
 The principle difference between this and other configurations is that
 it will cache windows updates even where a query string operates on a
 cab, exe, or other non-dynamic response. I find the query string does
 not change the file contents. (I know - it is possible that it
 could...)
 
 The other feature is that Microsoft conveniently include SHA1 hashes
 in URLs for static content files. Often, these static content files
 will be found at differing locations, and will often be called with
 query strings! Web cache hell! This configuration represents the data
 internally to squid based purely on the SHA1 hash where available. If
 two content items really have a SHA1 match, then you can guarantee
 they are identical. Any successive file accesses from any of the
 windows update domains which match the general SHA1 pattern used in
 windows updates will generate a cache HIT, even where the URL is quite
 different, and irrespective of any cache-bashing query string.
 
 I will monitor the configurations over the next week. Empirically, so
 far, it all works!
 If anyone can see howlers, let me know. Thanks!
 
 #squid.conf file for Squid Cache: Version 3.4.4
 #compiled on Ubuntu with configure options:  '--enable-async-io=8'
 '--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap'
 #'--enable-delay-pools' '--enable-underscores' '--enable-icap-client'
 '--enable-follow-x-forwarded-for' '--with-logdir=/var/log/squid3'
 #'--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536'
 '--with-large-files' '--with-default-user=proxy'
 #'--enable-linux-netfilter' '--enable-storeid-rewrite-helpers=file'
 
 #Recommendations: in full production, you may want to set debug
 options from 2 to 1 or 0.
 #You may also want to comment out strip_query_terms off for user privacy
 
 #Explicitly define logs for my compiled version
 cache_store_log /var/log/squid3/store.log
 access_log /var/log/squid3/access.log
 cache_log /var/log/squid3/cache.log
 
 #Lets have a fair bit of debugging info
 debug_options ALL,2
 #Include query strings in logs
 strip_query_terms off
 
 acl all src all
 acl windowsupdate dstdomain .windowsupdate.microsoft.com
 acl windowsupdate dstdomain .c.microsoft.com
 acl windowsupdate dstdomain .ws.microsoft.com
 acl windowsupdate dstdomain .update.microsoft.com
 acl windowsupdate dstdomain images.metaservices.microsoft.com
 acl windowsupdate dstdomain .download.windowsupdate.com
 acl windowsupdate dstdomain wustat.windows.com
 acl windowsupdate dstdomain swcdn.apple.com
 acl windowsupdate dstdomain data-cdn.mbupdates.com
 acl QUERY urlpath_regex cgi-bin \?
 
 #I'm  behind a NAT firewall, so I don't need to restrict access
 http_access allow all
 
 #Uncomment these if you have web apps on the local server which auth
 through local ip
 #acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
 #http_access deny to_localhost
 
 visible_hostname myclient.hostname.com
 http_port 3128
 
 #Always optimise bandwidth over hits
 cache_replacement_policy heap LFUDA
 #200Mb max object if not windowsupdate
 maximum_object_size 20 KB
 #Set these according to your file system
 cache_dir ufs /home/smb/squid/squid 7 16 256
 coredump_dir /home/smb/squid/squid
 
 refresh_pattern -i
 microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
 43200 80% 43200 override-lastmod override-expire ignore-reload
 ignore-must-revalidate ignore-private
 refresh_pattern -i
 windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
 43200 80% 43200 override-lastmod override-expire ignore-reload
 ignore-must-revalidate ignore-private
 refresh_pattern -i
 windows.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
 43200 80% 43200 override-lastmod override-expire ignore-reload
 ignore-must-revalidate ignore-private

Did your tests find any actual benefits in these override-lastmod
override-expire ignore-reload ignore-must-revalidate ignore-private
settings ?

My tests earlier showed the reload-into-ims option was all that was
needed to make update caching behave nicely. It is also the only one of
those options which produces RFC compliant behaviour by the proxy.


 #Default refresh patterns last if no others match
 refresh_pattern ^ftp: 1440 20% 10080
 refresh_pattern ^gopher: 1440 0% 1440
 refresh_pattern . 0 20% 4320
 
 #Directive sets I have been experimenting with
 #override-lastmod override-expire ignore-reload ignore-must-revalidate
 ignore-private
 

Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-13 Thread Nick Hill
Dear Amos

Thank you for reviewing the config and giving your deeply considered comments.

On 13 April 2014 09:56, Amos Jeffries squ...@treenet.co.nz wrote:
 Did your tests find any actual benefits in these override-lastmod
 override-expire ignore-reload ignore-must-revalidate ignore-private
 settings ?

 My tests earlier showed the reload-into-ims option was all that was
 needed to make update caching behave nicely. It is also the only one of
 those options which produces RFC compliant behaviour by the proxy.

Yes! Clients generate zillions of range requests. This creates loads
of revalidation.

I have adopted the assumption that exe, cab and such files on windows
update servers are static. A different file will take a different URL.

Perhaps there are border cases where this assumption would fail, and
maybe this needs more thought.  Although I think it is fair to
guarantee URLs with an embedded SHA1 checksum will always deliver the
same content.

I might rewrite this part to use reload-inot-ims for URL patterns
which don't include a checksum, and use the full override and never
expire for those URLs which do embed a checksum.


 NP: Squid understands byte units whenever you see KB being used in config.

 So:
  maximum_object_size 200 MB
  maximum_object_size 6 GB

 Which is the first howler. That directive deoes not take an access
 list and only last value set matters. So adding  windowsupdate to the
 6GB line and setting the 200MB value are both just useless text in the
 config file.

Ok. I really would like to limit object size on ACL, but will have to
live with that!




 #My internet connection is not just used for Squid. I want to leave
 #responsive bandwidth for other services. This limits D/L speed
 delay_pools 1
 delay_class 1 1
 delay_access 1 allow all
 delay_parameters 1 120/120

 It is better to use QoS controls in the system network settings that
 limit Squid (usually by PID number) than applying a class-1 delay pool
 to everything.

I do have an iptables firewall set up and will perhaps add that to the
bottom of my to-do list, unless I find it ineffectual and problematic.


 #We use the store_id helper to convert windows update file hashes to bare 
 URLs.
 #This way, any fetch for a given hash embedded in the URL will deliver
 the same data
 #You must make your own /etc/squid3/storeid_rewrite instructiosn at end.
 #change the helper program location from
 /usr/local/squid/libexec/storeid_file_rewrite to wherever yours is
 #It is written in PERL, so on most Linux systems, put it somewhere
 convenient, chmod 755 filename
 store_id_program /usr/local/squid/libexec/storeid_file_rewrite
 /etc/squid3/storeid_rewrite
 store_id_children 10 startup=5 idle=3 concurrency=0
 store_id_access allow windowsupdate
 store_id_access deny all


 concurrency=0 is bad. Although I see this is due to a lack of
 concurrency in the helper. Thats a bug which should get fixed.


 #We want to cache windowsupdate URLs which include queries
 #but only those queries which act on an installable file.
 #we don't want to cache queries on asp files as this is a genuine server
 #side query as opposed to a cache breaker
 acl wupdatecachablequery urlpath_regex
 (cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|appxbundle|esd)\?

 #Deny caching for URLs matching query but not windowsupdate
 cache deny QUERY !windowsupdate
 #Deny caching for URLs matching query and windowsupdate but not cachable 
 updates
 cache deny QUERY windowsupdate !wupdatecachablequery

 What does this help with exactly? Current Squid are prefectly capable of
 caching despite query-string presence.
 In fact we recommend dropping acl QUERY entirely and adding this right
 above the '.' refresh_pattern:
  refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

I have three classes.
Any URL with a query string.
Any URL to a windows update server.
Any URL to a windows update server which is specifically cache-able

To paraphrase the logic coded here:
Don't cache anything with a query string UNLESS it matches the ACL
wupdatecachablequery.

another way to write this more succinctly might be:
cache deny QUERY
cache allow wupdatecachablequery

But I am not certain whether the deny clause will take a higher
priority than the allow clause in cases where both ACLs match. The
fandangled logic avoids this.




 #Given windows update is un-cooperative towards third party
 #methods to reduce network bandwidth, it is safe to presume
 #cache-specific headers or dates significantly differing from
 #system date will be unhelpful
 reply_header_access Date deny windowsupdate
 reply_header_access Age deny windowsupdate

 The given actually is not true IME. So not a safe assumption.

 Bad behaviour in the HTTP/1.1 revalidation by clients is a common side
 effect of the override-* and ignore-* options being used on refresh_pattern.
  The overrides used above make Squid ignore the caching boundary
 conditions about when objects become stale or expire. So the client
 fetch can a) MISS 

Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-12 Thread Nick Hill
I have been ironing out issues with my windows updates set-up for
Squid. I have been through my squid.conf file to de-cruft it.

The following squid.conf should be self-documenting. I have found this
works well in a multi-computer environment where you can expect a lot
of Windows machines to perform updates. A computer shop is a good
example. Of course, you will want to configure a DHCP server with a
wpad.dat address so that your client machines will auto-configure to
use your proxy.

The principle difference between this and other configurations is that
it will cache windows updates even where a query string operates on a
cab, exe, or other non-dynamic response. I find the query string does
not change the file contents. (I know - it is possible that it
could...)

The other feature is that Microsoft conveniently include SHA1 hashes
in URLs for static content files. Often, these static content files
will be found at differing locations, and will often be called with
query strings! Web cache hell! This configuration represents the data
internally to squid based purely on the SHA1 hash where available. If
two content items really have a SHA1 match, then you can guarantee
they are identical. Any successive file accesses from any of the
windows update domains which match the general SHA1 pattern used in
windows updates will generate a cache HIT, even where the URL is quite
different, and irrespective of any cache-bashing query string.

I will monitor the configurations over the next week. Empirically, so
far, it all works!
If anyone can see howlers, let me know. Thanks!

#squid.conf file for Squid Cache: Version 3.4.4
#compiled on Ubuntu with configure options:  '--enable-async-io=8'
'--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap'
#'--enable-delay-pools' '--enable-underscores' '--enable-icap-client'
'--enable-follow-x-forwarded-for' '--with-logdir=/var/log/squid3'
#'--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536'
'--with-large-files' '--with-default-user=proxy'
#'--enable-linux-netfilter' '--enable-storeid-rewrite-helpers=file'

#Recommendations: in full production, you may want to set debug
options from 2 to 1 or 0.
#You may also want to comment out strip_query_terms off for user privacy

#Explicitly define logs for my compiled version
cache_store_log /var/log/squid3/store.log
access_log /var/log/squid3/access.log
cache_log /var/log/squid3/cache.log

#Lets have a fair bit of debugging info
debug_options ALL,2
#Include query strings in logs
strip_query_terms off

acl all src all
acl windowsupdate dstdomain .windowsupdate.microsoft.com
acl windowsupdate dstdomain .c.microsoft.com
acl windowsupdate dstdomain .ws.microsoft.com
acl windowsupdate dstdomain .update.microsoft.com
acl windowsupdate dstdomain images.metaservices.microsoft.com
acl windowsupdate dstdomain .download.windowsupdate.com
acl windowsupdate dstdomain wustat.windows.com
acl windowsupdate dstdomain swcdn.apple.com
acl windowsupdate dstdomain data-cdn.mbupdates.com
acl QUERY urlpath_regex cgi-bin \?

#I'm  behind a NAT firewall, so I don't need to restrict access
http_access allow all

#Uncomment these if you have web apps on the local server which auth
through local ip
#acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
#http_access deny to_localhost

visible_hostname myclient.hostname.com
http_port 3128

#Always optimise bandwidth over hits
cache_replacement_policy heap LFUDA
#200Mb max object if not windowsupdate
maximum_object_size 20 KB
#Set these according to your file system
cache_dir ufs /home/smb/squid/squid 7 16 256
coredump_dir /home/smb/squid/squid

refresh_pattern -i
microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 override-lastmod override-expire ignore-reload
ignore-must-revalidate ignore-private
refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 override-lastmod override-expire ignore-reload
ignore-must-revalidate ignore-private
refresh_pattern -i
windows.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 override-lastmod override-expire ignore-reload
ignore-must-revalidate ignore-private
#Default refresh patterns last if no others match
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320

#Directive sets I have been experimenting with
#override-lastmod override-expire ignore-reload ignore-must-revalidate
ignore-private
#reload-into-ims

#Windows updates use a lot of range requests. The only way to deal with this
#in Squid is to fetch the whole file as soon as requested
range_offset_limit -1 windowsupdate
quick_abort_min -1 KB windowsupdate

#Windows update files are HUGE! I have set this to 6Gb.
#A recent (as of Apr 2014) windows 8 update file is 4Gb
maximum_object_size 600 KB  windowsupdate

#My internet connection is not just used for Squid. I want to leave
#responsive bandwidth for other services. This limits 

Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-11 Thread Stephen Borrill
On 10/04/2014 20:07, Eliezer Croitoru wrote:
 Hey Nick,
 
 In a case you do know the tokens meaning and if it is working properly
 you can try to use StoreID in 3.4.X
 http://wiki.squid-cache.org/Features/StoreID
 
 It is designed to allow you this specific issue you are sure it is.
 
 About the 4GB or 1GB updates it's pretty simple.
 Microsoft release an update which contains everything about the about
 even that the update for your machine is only part of the file.
 This is what the last time I verified the issue.
 
 Also there is another side that OS become more and more complex and an
 update can be really big which almost replacing half of the OS components.
 
 What ever goes for you from the options is fine and still I have not
 seen microsoft cache solution.
 How is it called?

He's probably referring to WSUS:
http://en.wikipedia.org/wiki/Windows_Server_Update_Services

This isn't an HTTP cache solution, it downloads Windows updates and then
effectively acts as your own local Windows Update service - you point
your clients at it to get updates rather than the real ones.

-- 
Stephen


Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-11 Thread Nick Hill
Dear Ellezer

Thank you for this. it appears the way forward would be to check that
the URL matches a pattern, and if it does, compute the store_id from
the checksum embedded in the URL. The same pattern might be used
across a large range of windows update objects, thereby avoiding cache
misses even when the same object is fetched with a significantly
different URL. For example, different windows update versions, update
methods and product versions.

A checksum match is a guarantee the object is identical.

i understand issues could arise from differing header information. I
suppose it is a matter of trying it and see.







On 10 April 2014 20:07, Eliezer Croitoru elie...@ngtech.co.il wrote:
 Hey Nick,

 In a case you do know the tokens meaning and if it is working properly you
 can try to use StoreID in 3.4.X
 http://wiki.squid-cache.org/Features/StoreID

 It is designed to allow you this specific issue you are sure it is.

 About the 4GB or 1GB updates it's pretty simple.
 Microsoft release an update which contains everything about the about even
 that the update for your machine is only part of the file.
 This is what the last time I verified the issue.

 Also there is another side that OS become more and more complex and an
 update can be really big which almost replacing half of the OS components.

 What ever goes for you from the options is fine and still I have not seen
 microsoft cache solution.
 How is it called?

 Eliezer


 On 04/10/2014 08:50 PM, Nick Hill wrote:

 Is there a convenient way to configure Squid to do this?

 Thanks.




Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-11 Thread Eliezer Croitoru

On 04/11/2014 08:37 AM, Nick Hill wrote:

rmed a SGA1 checksum on the downloaded file. The checksum was
6fda48f8c83be2a15f49b83b10fc3dc8c1d15774

The file was downloaded using wget, with the tokens. This matches the
part of the file name between the underscore and period.

The only thing we need for Squid to match, is the part of the URL
between the underscore and period. If the checksum matches, we know
the content we are serving up is correct.
Do you by any chance have urls that can show this pattern in the form of 
logs?


Eliezer


Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-11 Thread Nick Hill
Hi Ellezer

I have re-compiled squid 4.3 along with the storeid_file_rewrite.
(Maybe largefile should be a default config directive!)

I added the following to squid.conf
store_id_program /usr/local/squid/libexec/storeid_file_rewrite
/etc/squid3/storeid_rewrite
store_id_children 40 startup=10 idle=5 concurrency=0
store_id_access allow windowsupdate
store_id_access deny all

My /etc/squid3/storeid_rewrite
^http:\/\/.+?\.ws\.microsoft\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx)
   http://wupdate.squid.local/$1
^http:\/\/.+?\.windowsupdate\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx)
   http://wupdate.squid.local/$1

echo 
http://download.windowsupdate.com/msdownload/update/common/2014/04/11935736_1ad4d6ce4701a9a52715213f48c337a1b4121dff.cab;
| /usr/local/squid/libexec/storeid_file_rewrite
/etc/squid3/storeid_rewrite
OK store-id=http://wupdate.squid.local/1ad4d6ce4701a9a52715213f48c337a1b4121dff

Still need to check it works OK.

On 10 April 2014 20:07, Eliezer Croitoru elie...@ngtech.co.il wrote:
 Hey Nick,

 In a case you do know the tokens meaning and if it is working properly you
 can try to use StoreID in 3.4.X
 http://wiki.squid-cache.org/Features/StoreID

 It is designed to allow you this specific issue you are sure it is.

 About the 4GB or 1GB updates it's pretty simple.
 Microsoft release an update which contains everything about the about even
 that the update for your machine is only part of the file.
 This is what the last time I verified the issue.

 Also there is another side that OS become more and more complex and an
 update can be really big which almost replacing half of the OS components.

 What ever goes for you from the options is fine and still I have not seen
 microsoft cache solution.
 How is it called?

 Eliezer


 On 04/10/2014 08:50 PM, Nick Hill wrote:

 Is there a convenient way to configure Squid to do this?

 Thanks.




[squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread Nick Hill
I found the discussion on the web post. On Nabble, which I presume
will not feed back to this list. I located the discussion forum from
the web site, have subscribed, and hope the message will be useful. A
web interface to this mailing list could be very useful to capture
important information from those users who seldom have something to
add.

I use a similar configuration on my Squid to the one used by HilltopsGM.

Microsoft have recently released an update 4Gb in size for Windows 8,
with range request downloads. This will likely cause Squid to use
excessive bandwidth. My cache was slaughtering bandwidth until I made
some changes.

it appears  Microsoft now use psf files, which appear to cache OK.

#Note: include psf files
refresh_pattern -i
microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320 80%
43200 reload-into-ims
refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320
80% 43200 reload-into-ims

#Having already defined the windowsupdate ACL,
range_offset_limit -1 windowsupdate
quick_abort_min -1 KB windowsupdate
maximum_object_size 500 KB  windowsupdate

#And for a cache replacement policy oriented to
#bandwidth conservation rather than latency
cache_replacement_policy heap LFUDA


--
My squid 3 configuration file now looks like:

debug_options ALL,2
acl all src all
http_access allow all
cache_store_log /var/log/squid/store.log
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
hosts_file /etc/hosts
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
acl manager proto cache_object
acl localhost src 127.0.0.1/32
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
acl purge method PURGE
acl CONNECT method CONNECT
cache_mem 256 MB
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
acl lan src 10.10.10.1/24
http_access allow localhost
http_access allow lan
visible_hostname myclient.hostname.com
http_port 3128

cache_replacement_policy heap LFUDA
maximum_object_size 20 KB
cache_dir ufs /home/smb/squid/squid 7 16 256
coredump_dir /home/smb/squid/squid

refresh_pattern -i
microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320 80%
43200 reload-into-ims
refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320
80% 43200 reload-into-ims

acl windowsupdate dstdomain windowsupdate.microsoft.com
acl windowsupdate dstdomain .update.microsoft.com
acl windowsupdate dstdomain download.windowsupdate.com
acl windowsupdate dstdomain redir.metaservices.microsoft.com
acl windowsupdate dstdomain images.metaservices.microsoft.com
acl windowsupdate dstdomain c.microsoft.com
acl windowsupdate dstdomain www.download.windowsupdate.com
acl windowsupdate dstdomain wustat.windows.com
acl windowsupdate dstdomain crl.microsoft.com
acl windowsupdate dstdomain swcdn.apple.com
acl windowsupdate dstdomain data-cdn.mbupdates.com

#header_access Pragma deny windowsupdate unrecognised in squid 3
#directives mentioned
http://www1.us.squid-cache.org/mail-archive/squid-users/200506/0684.html
- nick 16 Feb 09
range_offset_limit -1 windowsupdate
quick_abort_min -1 KB windowsupdate
maximum_object_size 500 KB  windowsupdate


##9 April 2014
##From http://wiki.squid-cache.org/Features/DelayPools
##limit squid to 1.2Mbit/second, reduce contention for updates
delay_pools 1
delay_class 1 1
delay_access 1 allow all
delay_parameters 1 120/120





--

On Tue, 20 Aug 2013 17:49:19 -0700 (PDT) HillTopsGM Wrote
Does this make sense then:

(START OF CODE FOR SQUID.CONF FILE)

#==
#Below is what I'd copy and past from the FAQ for windows updates:
#==

range_offset_limit -1
maximum_object_size 200 MB
quick_abort_min -1

# Add one of these lines for each of the websites you want to cache.

refresh_pattern -i
microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80%
43200 reload-into-ims

refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80%
43200 reload-into-ims

refresh_pattern -i
my.windowsupdate.website.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip)
4320 80% 43200 reload-into-ims

# DONT MODIFY THESE LINES
refresh_pattern \^ftp:   144020% 10080
refresh_pattern \^gopher:14400%  1440
refresh_pattern -i (/cgi-bin/|\?) 0 0%  0
refresh_pattern .   0   20% 4320

acl windowsupdate dstdomain windowsupdate.microsoft.com
acl windowsupdate dstdomain .update.microsoft.com
acl windowsupdate dstdomain download.windowsupdate.com
acl windowsupdate dstdomain redir.metaservices.microsoft.com
acl windowsupdate dstdomain images.metaservices.microsoft.com
acl windowsupdate dstdomain c.microsoft.com
acl windowsupdate dstdomain www.download.windowsupdate.com
acl windowsupdate dstdomain wustat.windows.com
acl windowsupdate dstdomain crl.microsoft.com
acl windowsupdate dstdomain sls.microsoft.com

[squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread Nick Hill
I notice Microsoft update for windows 8 is adding query strings to
URLs as a token. This makes it hard for open source caches to work
effectively with Microsoft Windows 8 updates. Maybe this is a method
to force users to use Microsoft's proprietary windows update caching
software. The recent KB2919355 update is a whopping 4Gb download.
Bigger than an ISO for windows 8. I don't understand why Microsoft
make the updates so large, and why they make them difficult to cache.
It is almost as if they wish to maximise the bandwidth windows update
consumes.

The form is:
http://bg.v4.a.dl.ws.microsoft.com/dl/content/d/updt/2013/09/f4d26fdb-d520-48da-add6-6a3c0832d14a_6fda48f8c83be2a15f49b83b10fc3dc8c1d15774.appx?P1=TOKENP2=TOKENP3=TOKENP4=TOKEN

The server won't deliver the file unless the tokens are in place.
Whenever a file is fetched, it appears to be the same irrespective of
the tokens. I will carry out more research based on checksums of
multiple files to make sure.
These same files are typically fetched using range requests. The file
example above is over 1Gb. Well worth caching.

I'm looking for a way to configure squid3 so that if the domain is
ws.microsoft.com and if the URL includes .appx?P1= then the URL is
fetched with query string from the source and stored without query
string. Any future request should match and deliver the stored file
irrespective of any query string.

Is there a convenient way to configure Squid to do this?

Thanks.


Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread Eliezer Croitoru

Hey Nick,

In a case you do know the tokens meaning and if it is working properly 
you can try to use StoreID in 3.4.X

http://wiki.squid-cache.org/Features/StoreID

It is designed to allow you this specific issue you are sure it is.

About the 4GB or 1GB updates it's pretty simple.
Microsoft release an update which contains everything about the about 
even that the update for your machine is only part of the file.

This is what the last time I verified the issue.

Also there is another side that OS become more and more complex and an 
update can be really big which almost replacing half of the OS components.


What ever goes for you from the options is fine and still I have not 
seen microsoft cache solution.

How is it called?

Eliezer

On 04/10/2014 08:50 PM, Nick Hill wrote:

Is there a convenient way to configure Squid to do this?

Thanks.




[squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread babajaga
Should I change the
cache allow mywindowsupdates
always_direct allow all
... to
cache allow mywindowsupdates
cache deny all 

To ONLY cache the windows updates, 

cache allow mywindowsupdates
cache deny all

would be correct.

#
#always_direct allow all #This is NOT related to caching.




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Re-Cache-Windows-Updates-ONLY-tp4665520p4665524.html
Sent from the Squid - Users mailing list archive at Nabble.com.


[squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread babajaga
The server won't deliver the file unless the tokens are in place.
Whenever a file is fetched, it appears to be the same irrespective of
the tokens. I will carry out more research based on checksums of
multiple files to make sure. 
I very doubt  to be the same ... . Because this would not make sense.
youtube does something similar for their videos, and there the tokens
contain add info like resolution of the movie, as it is distributed in
different ones. Depending upon actual connection speed, for instance.

So, the only reason to have random tokens in your case would be to confuse
the caches, which I doubt. OR it might signal some info regarding the size
of the range requests. Then it would be safe to ignore the tokens, as you
are considering, as the complete file will be cached within squid, and the
different ranges serviced from there. (Note: This is something, youtube did
some time ago. )
So you might test with different connections speeds, too.





--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Re-Cache-Windows-Updates-ONLY-tp4665520p4665525.html
Sent from the Squid - Users mailing list archive at Nabble.com.


Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread Amos Jeffries
On 11/04/2014 12:28 a.m., Nick Hill wrote:
 I found the discussion on the web post. On Nabble, which I presume
 will not feed back to this list. I located the discussion forum from
 the web site, have subscribed, and hope the message will be useful. A
 web interface to this mailing list could be very useful to capture
 important information from those users who seldom have something to
 add.
 
 I use a similar configuration on my Squid to the one used by HilltopsGM.
 
 Microsoft have recently released an update 4Gb in size for Windows 8,
 with range request downloads. This will likely cause Squid to use
 excessive bandwidth. My cache was slaughtering bandwidth until I made
 some changes.
 
 it appears  Microsoft now use psf files, which appear to cache OK.
 
 #Note: include psf files
 refresh_pattern -i
 microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320 80%
 43200 reload-into-ims
 refresh_pattern -i
 windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf) 4320
 80% 43200 reload-into-ims
 
 #Having already defined the windowsupdate ACL,
 range_offset_limit -1 windowsupdate
 quick_abort_min -1 KB windowsupdate
 maximum_object_size 500 KB  windowsupdate
 

Thank you for the details. I have updated the wiki patterns and notes:
http://wiki.squid-cache.org/SquidFaq/WindowsUpdate

Amos



Re: [squid-users] Re: Cache Windows Updates ONLY

2014-04-10 Thread Nick Hill
On 11 April 2014 05:15, babajaga augustus_me...@yahoo.de wrote:

 I very doubt  to be the same ... . Because this would not make sense.
 youtube does something similar for their videos, and there the tokens
 contain add info like resolution of the movie, as it is distributed in
 different ones. Depending upon actual connection speed, for instance.

I performed a SGA1 checksum on the downloaded file. The checksum was
6fda48f8c83be2a15f49b83b10fc3dc8c1d15774

The file was downloaded using wget, with the tokens. This matches the
part of the file name between the underscore and period.

The only thing we need for Squid to match, is the part of the URL
between the underscore and period. If the checksum matches, we know
the content we are serving up is correct.


[squid-users] Re: Cache Windows Updates ONLY

2013-08-20 Thread HillTopsGM
Amos Jeffries-2 wrote
 On 20/08/2013 4:31 a.m., HillTopsGM wrote:
 Hi All.
 I've been doing lots of reading and I believe I am understanding the
 basic
 concept of how to use Squid.
 /I've posted the hardware that I am using at the bottom of the post/.

 I have about 12 windows machines running at any one time and I was hoping
 to
 start using Squid to speed up the Windows updates in this environment -
 *NOTHING ELSE FOR NOW*, as I don't want the cache to potentially
 interfere
 with the other work going on.
 I will consider adding complexity as I go and continue to learn how to
 use
 it.

 For Practise I installed Squid using apt-get on a Linux Mint15
 installation
 - install went smoothly.

 I found the the *How do I make Windows Updates cache?* wiki here:
 http://wiki.squid-cache.org/SquidFaq/WindowsUpdate
 lt;http://wiki.squid-cache.org/SquidFaq/WindowsUpdategt;

 It looks straight forward - just copy and past the appropriate settings
 from
 that page into the squid.conf file.

 What I am not clear on is how to NOT CACHE anything else?
 Would someone be able to help me with that?
 
 The cache directive is what controls whether the reponse to a request 
 will be stored.
 http://www.squid-cache.org/Doc/config/cache/
 
 Amos

Thanks for jumping in Amos.

Does this make sense then:

(START OF CODE FOR SQUID.CONF FILE)

#==
#/Below is what I'd copy and past from the FAQ for windows updates:/
#==

range_offset_limit -1
maximum_object_size 200 MB
quick_abort_min -1

refresh_pattern -i microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip)
4320 80% 43200 reload-into-ims
refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80% 43200
reload-into-ims

# Add one of these lines for each of the websites you want to cache.

refresh_pattern -i microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip)
4320 80% 43200 reload-into-ims

refresh_pattern -i
windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80% 43200
reload-into-ims

refresh_pattern -i
my.windowsupdate.website.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip)
4320 80% 43200 reload-into-ims

# DONT MODIFY THESE LINES
refresh_pattern \^ftp:   144020% 10080
refresh_pattern \^gopher:14400%  1440
refresh_pattern -i (/cgi-bin/|\?) 0 0%  0
refresh_pattern .   0   20% 4320

#==
#Then I'd add this to ONLY cache the windows updates:
#==

acl mywindowsupdates dstdomain .my.windowsupdate.website.com
.windowsupdate.com .microsoft.com
cache allow mywindowsupdates
always_direct allow all

(END OF CODE FOR SQUID.CONF FILE)

Should I change the
cache allow mywindowsupdates
always_direct allow all
... to
cache allow mywindowsupdates
cache deny all

Thanks for the help




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Cache-Windows-Updates-ONLY-tp4661643p4661679.html
Sent from the Squid - Users mailing list archive at Nabble.com.