I'm currently working on a project to transform web content. It's basically a proxy server that has some Mod_Perl filters loaded to perform the transformations.
Below is a diagram that shows how the requests are being passed from client to web server.
CLIENT <-----> | TRANSFORMING PROXY | <-------> CORPORATE PROXY <------> WEB SERVER
| |
| --> REQUEST FILTER --> |
|<-- RESPONSE FILTER <-- |
| |
Heres the problem:
I need to give the web users an option to turn off the transformations for a session, effectivly giving them the original untransformed web page. The filter must be controllable on a per page and per user basis. I.e. every user can control what pages they would like transformed (by default they will be transformed). It does not matter if they have to turn the filter off every time they visit the page.
Initially I tried appending "&iProxy=OFF" to the query string. That way, when the filter received the response, it could see from the URI that the page should not be transformed. However, adding this to the query string is causing problems with some web servers.
What I would now like to do is strip off the "&iProxy=OFF" from the URI before forwarding the request on to the corporate proxy. However, I've then no way of detecting whether the content should be filtered when I get the response. Is there any way I can set a flag in the filter context of the request filter that can be read by the response filter? i.e. Is there anyway for the two filters to pass a flag? From my limited knowlegde of Mod Perl filters, I think that the response and requests filters have different filter contexts, therefore making it impossible for the two filters to share data using a filter context.
If there is no way of passing a flag between the two, does anyone know of an HTTP header that is copied from a request and placed into a response when the request is processed? For example, if I set a header in the request, I would want the same header (unmodified) to be returned in the response. This way, I can recognise content that mustn't be transformed when I get the response back.
E.g.
REQUEST:
GET http://aserver.com/index.php HTTP/1.1\n
Host: aserver.com
MyHeader: iProxyOFF\n
\n\n
RESPONSE:
HTTP/1.1 200 OK\n
Server: Apache/1.3.23 (Unix)\n
Connection: close\n
MyHeader: iProxyOFF\n
\n
365
Bla bla bla bla
Any other ideas to recognise pages that should not be transfomred would be appreciated!
Sorry this is a bit long and possibly confusing!
---
Regards,
Chris Pringle
UK PSG
Hewlett-Packard, Bristol
Tel: 0117 31 29664
Mob: 07752 307063