Domains in log being omitted.

Myron Szymanskyj Fri, 23 Nov 2001 14:02:44 -0500 (EST)

It's disjointed, but I think it's clear enough for you to understand, 
that's if anyone on this list can comment on my findings?  It's an issue of 
the domains of web sites visited not being logged in the GnatBOX's 
logs.  For instance, a log line entry would read:
   http:///index.htm


Also, if the GnaBOX relies on the `Host:` header in the HTTP request to 
allow CyberNOT (I don't use it, but am considering to use it.) to function, 
and that header is stripped or lost, then how can CyberNOT function?

Yeh, I'm also aware on the implication if a web server is running multiple 
sites on a single IP.  That server will not know what site yto access and 
will access the default site for the allocated IP address.

Still, that's not the issue here.  It's the incomplete log entries where 
the domains are missing.  An idea would be for the GnatBOX to perform a 
reverse DNS lookup against the IP should the `Host:` header be missing.

All comments and ideas entertained, apart from junking the proxy 
server.  (Quite a possibility.)

The secondary issues is the web sites visited can be masked in a small way, 
requiring extra administrative work to determine what sites have been 
visited and being able to totally bypass web content filtering.

   --!----------------------------------------------------!--

The Proxy server is not sending all the headers and I'm assuming the 
GnatBOX is using the `Host:` header to log the web site visited.

I conducted this tests against `www.gta.com`.

Using Proxy server.  The first GET directive sent by the Browser.
(Actually, the proxy server sends the request to the web server on behalf 
of the browser to IP `199.120.225.2`.)

00000030                    47 45 54 20 2F 20 48 54 54 50       GET./.HTTP
00000040  2F 31 2E 30 0D 0A                               /1.0..

Now, bypassing the Proxy server, `www.gta.com` is visited.

00000030                    47 45 54 20 2F 20 48 54 54 50       GET./.HTTP
00000040  2F 31 2E 31 0D 0A 41 63 63 65 70 74 3A 20 2A 2F /1.1..Accept:.*/
00000050  2A 0D 0A 41 63 63 65 70 74 2D 4C 61 6E 67 75 61 *..Accept-Langua
00000060  67 65 3A 20 65 6E 2D 67 62 0D 0A 41 63 63 65 70 ge:.en-gb..Accep
00000070  74 2D 45 6E 63 6F 64 69 6E 67 3A 20 67 7A 69 70 t-Encoding:.gzip
00000080  2C 20 64 65 66 6C 61 74 65 0D 0A 55 73 65 72 2D ,.deflate..User-
00000090  41 67 65 6E 74 3A 20 4D 6F 7A 69 6C 6C 61 2F 34 Agent:.Mozilla/4
000000A0  2E 30 20 28 63 6F 6D 70 61 74 69 62 6C 65 3B 20 .0.(compatible;.
000000B0  4D 53 49 45 20 35 2E 35 3B 20 57 69 6E 64 6F 77 MSIE.5.5;.Window
000000C0  73 20 4E 54 20 35 2E 30 29 0D 0A 48 6F 73 74 3A s.NT.5.0)..Host:
000000D0  20 77 77 77 2E 67 74 61 2E 63 6F 6D 0D 0A 43 6F .www.gta.com..Co
000000E0  6E 6E 65 63 74 69 6F 6E 3A 20 4B 65 65 70 2D 41 nnection:.Keep-A
000000F0  6C 69 76 65 0D 0A 43 61 63 68 65 2D 43 6F 6E 74 live..Cache-Cont
00000100  72 6F 6C 3A 20 6E 6F 2D 63 61 63 68 65 0D 0A 0D rol:.no-cache...
00000110  0A                                              .

What if content filtering is enabled, but the user on their workstation has 
somehow managed to lose all the HTTP request headers apart from the GET 
header?  If the `Host:` header is relied upon by the GnatBOX for filtering 
and logging then that just nullifies both the features.

If there is no `Host:` header in the HTTP request headers, could the 
GnatBOX then do a reverse DNS lookup to determine what the domain should be?

Yes, I know this is not always possible to get the right name returned, as 
I'm demonstrating with the following lookup.  As in the BBC advertise their 
web site as `www.bbc.co.uk` then in reality it's `www.bbc.net.uk`.

  Non-authoritative answer:
  Name:    www.bbc.net.uk
  Address:  212.58.224.36
  Aliases:  www.bbc.co.uk

Non-authoritative because our internal DNS has cached the name and IP details.

   --!----------------------------------------------------!--

Domains in log being omitted.

Reply via email to