[squid-users] Pass the username to the cache parent

2014-03-14 Thread David Touzeau


Dear Best list

I'm using Squid connected to an Active Directory server in front of users.

We have a central Squid server that act has Parent.
This Squid parent server serves as caching and did not have any 
authentication method


I would like the child Squid sends usernames to the Squid Parent in order to 
log users ID inside access.log.


Is there somebody have successfully set the kind of architecture ?
Is it Possible ?

best regards



[squid-users] Re: Automatic StoreID ?

2014-03-14 Thread Omid Kosari
Amos Jeffries-2 wrote
 You just described how Store-ID feature works today.
 
 The map of urlA == urlB == urlC is inside the helper. You can make it a 
 static list of regex patterns like the original Squid-2 helpers, a DB 
 text file of patterns like the bundled Squid-3 helper, or anything else 
 you like inside the helper.
   Squid learns the mappings by asking the helper about each URL. There is 
 a helper response cache on these lookups same as other helpers and 
 prevent complex/slow mappings having much impact on hot objects.
 
 Amos

Really ? Squid has it's own learning mechanism without need human hand ?
Also it can GUESS new urls which it was not aware till now ?

One more question . Squid will delete current duplicate objects ?



--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Automatic-StoreID-tp4665140p4665189.html
Sent from the Squid - Users mailing list archive at Nabble.com.


Re: [squid-users] Re: Automatic StoreID ?

2014-03-14 Thread Eliezer Croitoru

On 13/03/2014 22:21, Amos Jeffries wrote:

Adding a domain or acl test for internal squid StoreID feature to allow 
it run faster but with a patch to the sources.


I was thinking about adding the code to the StoreID reply section on a 
ERR case while another flag is being used to allow this option and note 
that it will not work when using an external helper.


What do we think about an ides with this effect?
What exists inside squid code that can help me work with regex 
extraction and match stuff?

maybe use an acl like code?

how a about reading the perl DB into squid internals?
pointers are welcome.

Eliezer


You just described how Store-ID feature works today.

The map of urlA == urlB == urlC is inside the helper. You can make it a
static list of regex patterns like the original Squid-2 helpers, a DB
text file of patterns like the bundled Squid-3 helper, or anything else
you like inside the helper.
  Squid learns the mappings by asking the helper about each URL. There
is a helper response cache on these lookups same as other helpers and
prevent complex/slow mappings having much impact on hot objects.

Amos




Re: [squid-users] Pass the username to the cache parent

2014-03-14 Thread Amos Jeffries
On 14/03/2014 9:18 p.m., David Touzeau wrote:
 
 Dear Best list
 
 I'm using Squid connected to an Active Directory server in front of users.
 
 We have a central Squid server that act has Parent.
 This Squid parent server serves as caching and did not have any
 authentication method
 
 I would like the child Squid sends usernames to the Squid Parent in
 order to log users ID inside access.log.
 
 Is there somebody have successfully set the kind of architecture ?
 Is it Possible ?

Supported since Squid-3.2:
   cache_peer ... login=PASSTHRU

Amos


Re: [squid-users] Re: Automatic StoreID ?

2014-03-14 Thread Amos Jeffries
On 14/03/2014 9:20 p.m., Omid Kosari wrote:
 Amos Jeffries-2 wrote
 You just described how Store-ID feature works today.

 The map of urlA == urlB == urlC is inside the helper. You can make it a 
 static list of regex patterns like the original Squid-2 helpers, a DB 
 text file of patterns like the bundled Squid-3 helper, or anything else 
 you like inside the helper.
   Squid learns the mappings by asking the helper about each URL. There is 
 a helper response cache on these lookups same as other helpers and 
 prevent complex/slow mappings having much impact on hot objects.

 Amos
 
 Really ? Squid has it's own learning mechanism without need human hand ?
 Also it can GUESS new urls which it was not aware till now ?

You did not describe any learning mechanism. Just stated the parts Squid
already does: when three URL are known by the helper to be identical the
first fetch for any one of them causes that object to be cached, then
later reuqests for any of them use that cached version. urlB/urlC need
not have been fetched at all.

Squid just asks the helper for information about each URL, the helper
could be made to contain any learning mechanism you want.
 The ones bundled with Squid leverage human knowledge in the form of
database list of patterns.

In short. It is ready for you to figure out how that learning should be
done and make a helper to do it.

 
 One more question . Squid will delete current duplicate objects ?
 

Squid does lazy deletion. It deletes just before re-using the cache
position for another file, or when needing to free up space.

Amos



Re: [squid-users] Pass the username to the cache parent

2014-03-14 Thread David Touzeau



On 14/03/2014 9:18 p.m., David Touzeau wrote:


Dear Best list

I'm using Squid connected to an Active Directory server in front of users.

We have a central Squid server that act has Parent.
This Squid parent server serves as caching and did not have any
authentication method

I would like the child Squid sends usernames to the Squid Parent in
order to log users ID inside access.log.

Is there somebody have successfully set the kind of architecture ?
Is it Possible ?


Supported since Squid-3.2:
  cache_peer ... login=PASSTHRU

Amos



Thanks Amos
I will try this feature...





Re: [squid-users] Re: Automatic StoreID ?

2014-03-14 Thread csn233
On Fri, Mar 14, 2014 at 4:20 PM, Omid Kosari omidkos...@yahoo.com wrote:

 Really ? Squid has it's own learning mechanism without need human hand ?
 Also it can GUESS new urls which it was not aware till now ?

Squid doesn't have it's own learning mechanism, it simply does what
the helper *you* wrote tells it to do.

However, theoretically, it should be possible to compute and store the
checksum of the object *after* it's been fetch, and store that in a
URL/checksum database. But this requires more than just a StoreID
helper, as the helper's role is before the fetch. Something else needs
to do the checksum after the fetch.

And the database needs to build up its entries first before it becomes
useful. The question is whether the additional percentage of hits you
get is worth the effort.


Re: [squid-users] Automatic StoreID ?

2014-03-14 Thread Nikolai Gorchilov
On Tue, Mar 11, 2014 at 9:43 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 03/11/2014 01:18 PM, Nikolai Gorchilov wrote:
 On Tue, Mar 11, 2014 at 6:10 PM, Alex Rousskov wrote:
 On 03/11/2014 08:05 AM, Omid Kosari wrote:
 Is it possible for Squid to automatically find every similar object based 
 on
 something like md5 of objects and serve them to clients without need custom
 DB ?


 No, because clients do not tell Squid what checksum they are looking
 for.

 It is possible to avoid caching duplicate content, but that allows you
 to handle cache hits more efficiently. It does not help with cache
 misses (when the URL requested by the client has not been seen before).


 Actually, two commercial vendors - PeerApp and ThunderCache - claim
 their products doesn't use urls to identify the objects, thus they
 don't have to maintain StoreID-like de-duplication database manually.

 Any ideas how do they do it?

 Most likely they do not, and you are simply being mislead by their
 marketing claims. In general, it is not possible to ignore the request

I also suspected it is just marketing. But wanted to check if I miss
something :)

 URL and still produce the right response (think about it!). They
 probably do not store duplicate cache objects, but, as discussed above,
 that is far from the automatic StoreID functionality that the original
 poster is asking about.

 In other words, there are at least two de-duplication layers:

 * The higher-level one is based on URLs and essentially requires manual
 URL mapping. It helps turn cache misses into hits.

 * The lower-level one is based on checksums and can be automated. It
 helps spend less cache space to serve cache hits. Some commercial
 products have implemented this lower-level optimization.

I was thinking about this second option some time back. It doesn't
seem very complicated and I see clear benefits if implemented in
Squid, thus having the best of both worlds.

Having lower-level checksum-based deduplication in a combination with
some form of feedback mechanizm (logging, helper, etc) can be used by
either humans or heruistic algorithms to create/update StoreID
patterns.

Best,
Niki


[squid-users] logrotate only instead (all) squid rotate

2014-03-14 Thread Alfredo Rezinovsky
Using:  squid -k rotate  squid rotates logs but also closes and reopen 
caches_dirs and url_rewrite_programs


There's a way to signal only the (logfile-daemon) processes to rotate 
the logs and only the logs ?


--
Alfrenovsky


Re: [squid-users] Re: ICP and HTCP and StoreID

2014-03-14 Thread Nikolai Gorchilov
On Thu, Mar 13, 2014 at 5:44 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 03/13/2014 07:24 AM, Nikolai Gorchilov wrote:
 On Wed, Mar 12, 2014 at 1:27 AM, Alex Rousskov wrote:
 Just to make sure we are on the same page, here is a list of options I
 recall being discussed:

 1. Using ICP reqnum field as a cache key.

 I don't understand how this option is going to work. AFAIK regnum
 is just 4 octets long - how is it supposed to accommodate the StoreID?

 By using StoreIDs that are 31 bits long. Recall that you control the
 StoreID map and, in most cases, there are fewer than 2^31 mapped/altered
 URLs in the cache, so one could use positive reqnums as regular
 reqnums and negative reqnums as this is my special StoreID reqnums.
 There are other caveats or optimizations that may make sense with this
 scheme. And, as I said earlier, this is a hack (that may work well in
 some environments).

I can't think of a reliable checksum algorithm that can fit in 31 bits
:) This means some form of db-based storeid-to-url mapping, that has
to be shared between cache peers. It adds too much complexity and
reduced reliability in the helpers...

Using MD5 as StoreID can do the job, but this is option 2.

 2. Adding StoreID to ICP/HTCP requests as an optional field.
 3. Computing StoreID upon receiving a regular ICP/HTCP request.

 Out of those three, do you prefer #3? Note that #1 is a little hackish,
 but may be a easier to implement (and is a lot cheaper CPU-wise) than
 #3. Neither #1 nor #3 make the ICP packets bigger, unlike #2.

 Option 3 is the only universal solution that works in all scenarios.
 Sharing the a StoreID string or a derivative of it
 (checksum/hash/digest/whatever) will do only for peers using same
 StoreID rewriting logic.

 Yes, of course. And with a StoreID cache or, in the worst case, a
 loaded module computing Store IDs, it will be fast enough too.

To sum it ip, the above list ordered by preference:
Option 3 with StoreID helper and StoreID caching
Option 2 (using MD5 to minimize the packet size)
Option 3 with StoreID helper, but without StoreID caching
Option 1


[squid-users] Re: Automatic StoreID ?

2014-03-14 Thread babajaga
Actually, two commercial vendors - PeerApp and ThunderCache - claim
their products doesn't use urls to identify the objects, thus they
don't have to maintain StoreID-like de-duplication database manually.

Any ideas how do they do it? 

Instead of first mapping the URL to a memory-resident table, keeping
pointers (file-id, bucket no.) to the real location of the object on disk, a
hash-value, derived from the URL could directly be used to designate the
storage location on disk, avoiding the translation table, squid uses.
This is the principle of every hashed table in a fast database system.
Drawback is, you have to deal with collisions on the disk and overflows:
hashes for different URLs point to same storage location on disk. Different
solutions for this problem available, though (chaining, sequential storage,
secondary storage area etc.). And you have to manage variable sized
buckets, the storage locations, hashing points to.

Positive consequence: No rebuild of the in-memory-table necessary, as there
is none. Avoids the time-comsuning rebuild of rock-storage-table from disk.

I can imagine, that because of historical reasons (much simpler to
implement), squid uses the translation-table instead of direct hashing,
whereas Thundercache etc. can rely on some low-level DB-system, having
direct hashing ready to be used.

 




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Automatic-StoreID-tp4665140p4665198.html
Sent from the Squid - Users mailing list archive at Nabble.com.


Re: [squid-users] Re: SquidGuard redirect to parent proxy (Off-Topic)

2014-03-14 Thread Christian Scholz

Am 2014-03-13 21:32, schrieb Amos Jeffries:

On 2014-03-14 05:21, Christian Scholz wrote:

Hi,

I know that my question is a little bit off-topic but nevertheless I
hope that some can help me :-)
I've configured squid3 with squidguard and one parent-proxy. In the
case of access violation squidguard redirects the user to a customized
block page hosted by the proxy himself.

Unfortunately the proxy tries to access the local blockpage over his
parent proxy. Does some have an idea why?



1) this is a re-write, not a redirect.
 HTTP redirects have a 3xx status code prefixing the URL in squidguard 
config.

   redirect 302:http://example.com/   # redirect client to example.com
   redirect http://example.com/   # re-write URL to
http://example.com and fetch

2) you probably also have no cache_peer_access rules preventing the
parent from being a source for these
ttp://proxyname.localsuffix/... URLs.


Amos


Okay, I've fixed it with the following lines
acl local-domain dstdomain proxyname.localsuffix
always_direct allow local-domain

Thanks!


Re: [squid-users] Re: Automatic StoreID ?

2014-03-14 Thread Alex Rousskov
On 03/14/2014 06:34 AM, babajaga wrote:

 Instead of first mapping the URL to a memory-resident table, keeping
 pointers (file-id, bucket no.) to the real location of the object on disk, a
 hash-value, derived from the URL could directly be used to designate the
 storage location on disk, avoiding the translation table, squid uses.

This is how Rock store does it, essentially: Rock store index does not
store the real location of the object on disk but computes it based on
the hash value.


 Positive consequence: No rebuild of the in-memory-table necessary, as there
 is none. Avoids the time-comsuning rebuild of rock-storage-table from disk.

While Rock store can avoid building the memory-resident index, you
actually want that table in most cases: If you do not build the index,
you have to do a disk I/O to fetch the first slot of the candidate
object on _every_ request. Without that disk I/O, Squid would not know
whether it has a hit or a miss because _every_ URL corresponds to a
valid location on disk. You have an infinite number of URLs pointing to
the same location and, without a memory-resident table, you do not know
what is actually stored there (if anything at all) until you do that
disk I/O.

For reverse proxy caches with very high hit ratios, avoiding the rebuild
may indeed be a good optimization where warmup time is more important
than speed. Most such proxies have small caches that do not require a
long rebuild, making the need for that optimization moot though.

The building of the index needs to be optimized, but that is a different
story.

Note that Rock store can cache new objects while building the index
(because the index does not store the object location).


Cheers,

Alex.



Re: [squid-users] Re: Automatic StoreID ?

2014-03-14 Thread Alex Rousskov
On 03/14/2014 02:36 AM, Eliezer Croitoru wrote:
 Adding a domain or acl test for internal squid StoreID feature to allow
 it run faster but with a patch to the sources.
 
 I was thinking about adding the code to the StoreID reply section on a
 ERR case while another flag is being used to allow this option and note
 that it will not work when using an external helper.

You can add a new store_id_map directive. I do not think it should
depend on store_id_program actions. The two options do not even have to
be mutually exclusive: if store_id_map does not match, check
store_id_access.

  store_id_map filename with a regex map acl1 acl2 ...


 how a about reading the perl DB into squid internals?

If you do something like that, I urge you to revise the current regex
map file syntax used by a popular StoreID script to allow for comments
(if not already allowed) and to contain complete substitution patterns
instead of space-separated from/to tokens:

# comment
s/replace this/with that/flags
s@can use custom delimiters and other regex features@as needed@g
^this line is an invalid line example$

To minimize confusion, you can even require that the map file starts
with some well-defined prefix. For example:

  #Store ID Map
  #Version: 1.0

This approach will allow this feature to evolve as needed.


 What exists inside squid code that can help me work with regex
 extraction and match stuff?
 maybe use an acl like code?
 
 pointers are welcome.

  $ fgrep -RI regcomp src


HTH,

Alex.



[squid-users] Cygwin SSL bumping

2014-03-14 Thread Derek Jones
Hi,

I am trying to run Squid on a Windows Server 2008 R2 Standard as a
Squid in the middle. I need to do SSL bumping, and I need to to
block access to certain websites (eg. sites with the word games in
the url)

I've installed Cygwin on the server, and included squid in the
installation. Where do I go from here? From my understanding I am not
able to run ./configure, and then make / make install to enable
features such as ssl-crtd.

Any help would be greatly appreciated.

Thanks!
Derek


[squid-users] Is it possible to mark tcp_outgoing_mark (server side) with SAME MARK as incoming packet (client side)?

2014-03-14 Thread Amm
Hello,

I would like to mark outgoing packet (on server side) with SAME MARK as on 
incoming (NATed or CONNECTed) packet.

There is option tcp_outgoing_mark with which I can mark packets.

But there is no ACL option to check incoming mark.


If there is already a way to do this then please guide.


Otherwise I would like to suggest:

Option 1)
---


Syntax: tcp_outgoing_mark SAMEMARK [!]aclname

where SAMEMARK is special (literal) word where acl matching are applied same 
mark as on incoming packet.

For e.g I can do:

tcp_outgoing_mark SAMEMARK all

And all packets will be applied same mark as incoming packet mark.


Option 2)
---


Have an acl:

Syntax: acl aclname nfmark mark-value


Then I can do something like this:

acl mark101 nfmark 0x101
tcp_outgoing_mark 0x101 mark101


If both above options can be combined then it would be even better.

Thanks in advance,

Amm.