Well I think there are acctually working solutions if you Google EZProxy squid 
you find tons of people who have cobbled together solutions, you just need the 
best pieces together to make it work! Who wants to start the GitHub repo?

Sent from my iPhone

> On Jan 29, 2014, at 8:58 PM, "Scott Prater" <pra...@wisc.edu> wrote:
>
> I second Stuart's kudos. Replacing EZProxy with an Apache proxy sounds just 
> crazy enough to be brilliant. I could see an open source recipe book taking 
> shape: how to accomplish EZProxy functions using Apache modules and their 
> directives. I think that might end up being more useful than yet another 
> standalone proxy application.
>
> -- Scott
>> On 01/29/14, stuart yeates wrote:
>> Thank you Andrew, that is insanely useful.
>>
>> cheers
>> stuart
>>
>>> On 30/01/14 12:00, Andrew Anderson wrote:
>>> When OCLC first announced their purchase of EZproxy, we started a low 
>>> priority research project to see what the alternatives were a few years 
>>> ago, and what it would take to bring them into a production ready state. 
>>> The two open source solutions we evaluated were Squid and Apache HTTPd. We 
>>> considered other options (e.g. Apache Traffic Server), but limited the 
>>> research to these two pieces of software since they are already widely used 
>>> and familiar to most system administrators.
>>>
>>> Long story short, Squid did not support URL rewriting in a way that we felt 
>>> would be able to be supported well, between requiring patches to the core 
>>> C++ server code, or an external rewriting processes, or an ICAP server 
>>> implementation. Some of that has improved a bit since the original 
>>> evaluation, but the built-in support for URL rewriting may still need some 
>>> time to mature. Another aspect of Squid that did not seem to be a good fit 
>>> was that it is somewhat limited in its authentication mechanisms vs Apache 
>>> HTTPd.
>>>
>>> So we moved on to evaluating Apache HTTPd with the mod_proxy family of 
>>> modules. While Apache HTTPd does not support the advanced cache federation 
>>> features as Squid, it has grown to be a robust proxy solution in its own 
>>> right, and the 2.4 release appears to have all of the required pieces out 
>>> of the box, with the mod_proxy_html module functionality. In addition to 
>>> basic URL rewriting support, you get full HTTP protocol support, mature 
>>> IPv6 support, GZIP support, just about any authentication mechanism you 
>>> need, a server that you can self-host content with easily, as well as a 
>>> built-in HTTP object cache.
>>>
>>> How would it work?
>>>
>>> Here’s the current EZproxy stanza for ProQuest:
>>>
>>> HTTPHeader X-Requested-With
>>> HTTPHeader Accept-Encoding
>>> Title ProQuest
>>> URL http://search.proquest.com/ip
>>> DJ proquest.com
>>> HJ gateway.proquest.com
>>> DJ umi.com
>>> HJ fedsearch.proquest.com
>>> HJ literature.proquest.com
>>> DJ conquest-leg-insight.com
>>> DJ conquestsystems.com
>>> DJ m.search.proquest.com
>>> DJ media.proquest.com
>>> NeverProxy order.proquest.com
>>> NeverProxy rss.proquest.com
>>>
>>> Here’s an Apache HTTPd configuration using ProQuest that accomplishes much 
>>> of the same functionality for the main search.proquest.com interface:
>>>
>>> <VirtualHost _default_:80>
>>> ServerName search.proquest.com.fqdn
>>>
>>> ProxyRequests Off
>>> ProxyVia On
>>>
>>> RewriteEngine On
>>> RewriteRule ^/(.*) http://search.proquest.com/$1 [P]
>>>
>>> <Location “/“>
>>> AllowMethods GET POST OPTIONS
>>> ProxyPassReverse http://search.proquest.com/
>>> ProxyPassReverseCookieDomain search.proquest.com search.proquest.com.fqdn
>>> CacheEnable disk
>>> SetOutputFilter INFLATE;DEFLATE
>>> Header Append Vary User-Agent env=!dont-vary
>>> # Put Authentication directives here
>>> ErrorDocument 401 /path/to/login
>>> Require Valid-User
>>> </Location>
>>> </virtualHost>
>>>
>>> A few notes on this:
>>>
>>> - There is no need for NeverProxy: if you do not define a VirtualHost for 
>>> the hostname, it is not proxied. So instead of HJ and DJ lines, you add a 
>>> new VirtualHost block for each hostname that needs to be proxied. The 
>>> astute will ask “what about services that have dozens or hundreds of host 
>>> entries, like Sage?” Those can be handled by the ProxyExpress features in 
>>> Apache HTTPd.
>>>
>>> - There is no need for HTTPHeader: since Apache HTTPd is a full HTTP 
>>> proxy/server, it supports all HTTP headers natively.
>>>
>>> - Some of the hostnames that are in EZproxy stanzas are not needed, and 
>>> some are legacy hostnames that are no longer used by the vendor
>>>
>>> - Some of the hostnames that are in EZproxy stanzas are for CDN hosted 
>>> content that requires no special access (e.g. JavaScript/CSS/graphics 
>>> assets that make up the vendor’s user interface). Another example: how many 
>>> of you have “DJ google.com” in one of your stanzas? Now how many of you 
>>> registered your IP addresses with Google in any way? Outside of Google 
>>> Scholar, I suspect the answer to those questions are “nearly everyone” and 
>>> “nearly no one”, respectively.
>>>
>>> - Some of the hostnames are for things that no sane person would do: How 
>>> many people run their discovery services through their EZproxy server vs. 
>>> authenticating their discovery platform by IP address with vendors directly?
>>>
>>> - Something that this configuration does that EZproxy does not do is enable 
>>> object caching. This can easily save 30-50% of your upstream bandwidth 
>>> usage (Proxy/ProxySSL in EZproxy can achieve the same result with an 
>>> external caching proxy server).
>>>
>>> - More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML 
>>> directives and ProxyHTMLURLMap configured, and multiple VirtualHost 
>>> sections to get them fully working. These can be a little fun to get 
>>> working initially.
>>>
>>> - Some services need redirects edited to work correctly, and not break out 
>>> of the proxy:
>>>
>>>    Header edit Location http://vendor/ http://vendor.fqdn/
>>>
>>> - Some vendors send wrong HTTP headers for the MIME type, and 
>>> mod_proxy_html exposes this in some cases as it rewrites the page. There 
>>> may be a better way to do this, but this is what I threw together for 
>>> testing:
>>>
>>>    <Location “/badpath”>
>>> ProxyHTMLEnable Off
>>> SetOutputFilter INFLATE;dummy-html-to-plain
>>> ExtFilterOptions LogStdErr Onfail=remove
>>>    </Location>
>>>    ExtFilterDefine dummy-html-to-plain mode=output intype=text/html 
>>> outtype=text/plain cmd=“/bin/cat -“
>>>
>>> So what’s currently missing in the Apache HTTPd solution?
>>>
>>> - Services that use an authentication token (predominantly ebook vendors) 
>>> need special support written. I have been entertaining using mod_lua for 
>>> this to make this support relatively easy for someone who is not hard-core 
>>> technical to maintain.
>>>
>>> - Services that are not IP authenticated, but use one of the Form-based 
>>> authentication variants. I suspect that an approach that injects a script 
>>> tag into the page pointing to javascript that handles the form 
>>> fill/submission might be a sane approach here. This should also cleanly 
>>> deal with the ASP.net abominations that use __PAGESTATE to store sessions 
>>> client-side instead of server-side.
>>>
>>> - EZproxy’s built-in DNS server (enabled with the “DNS” directive) would 
>>> need to be handled using a separate DNS server (there are several options 
>>> to choose from).
>>>
>>> - In this setup, standard systems-level management and reporting tools 
>>> would be used instead of the /admin interface in EZproxy
>>>
>>> - In this setup, the functionality of the EZproxy /menu URL would need to 
>>> be handled externally. This may not be a real issue, as many academic sites 
>>> already use LMS or portal systems instead of the EZproxy to direct students 
>>> to resources, so this feature may not be as critical to replicate.
>>>
>>> - And of course, extensive testing. While the above ProQuest stanza works 
>>> for the main ProQuest search interface, it won’t work for everyone, 
>>> everywhere just yet.
>>>
>>> Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have 
>>> a system administrator who knows their way around Apache HTTPd, and are 
>>> willing to spend some time getting to know your vendor services intimately.
>>>
>>> All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, 
>>> which should be available in RHEL7/CentOS7 soon, so about the time that 
>>> hard decisions are to be made regarding EZproxy vs something else, that 
>>> something else may very well be Apache HTTPd with vendor-specific 
>>> configuration files.
>>
>>
>> --
>> Stuart Yeates
>> Library Technology Services http://www.victoria.ac.nz/library/
>
> --
> --
> Scott Prater
> Shared Development Group
> General Library System
> University of Wisconsin - Madison
> pra...@wisc.edu

Reply via email to