On Wednesday 09 January 2002 14:32, you wrote:
> On Wed, Jan 09, 2002 at 09:57:05AM -0500, Brian Wheeler wrote:
> > On Wed, 2002-01-09 at 09:35, Robin Berjon wrote:
> > > On Wednesday 09 January 2002 15:24, Sebastian Rahtz wrote:
> > > > On Wed, Jan 09, 2002 at 03:22:22PM +0100, Robin Berjon wrote:
> > > > > On Tuesday 08 January 2002 22:38, Brian Wheeler wrote:
> > > > > > Our site isn't showing up on various search engines (google, alta
> > > > > > vista) properly.  Is anyone else seeing this on axkit-driven
> > > > > > sites? We're running axkit 1.4.
> > > > >
> > > > > Pretty much all of our sites that are running axkit show up in
> > > > > Google, with proper descriptions and all.
> > > >
> > > > As one would expectl; how can Google possibly know its an axkit site?
> > > > the result is, after all,  just HTML.....
> > >
> > > Yes but there have been cases of dynamic generation systems or
> > > publishing systems that caused search engines to run away (or downgrade
> > > the content). I think this is what Brian was concerned with.
> >
> > *sniff* I just want it to work :)
> >
> > Seriously, though, the search
> >
> > site:icpac.indiana.edu xml
> >
> > on google returns hits, but no summaries or titles...but not nearly all
> > the pages on our site (somewhere near 39,000)
>
> Could it be your /robots.txt?
>
> Wendell
>

Search engines can be very picky, and they often use some very arcane logic 
to determine how to index things. I doubt a PICS header is the problem, 
though anything is possible. The CRs are more likely to foul something up, 
but also I would NOT expect that you would end up with 39,000 indexed pages. 
Thats a HUGE amount of content to expect to have indexed on one site. I'm not 
sure how Google decides exactly how deep to go.

Its possible that Google is smart enough to determine that the content is 
dynamically generated and deprecate it for that reason as well. Some reasons 
could include cache control directives which Apache or AxKit include in the 
response in order to prevent browsers from caching dynamic content. Apache 
for instance is normally configured to immediately expire generated content. 
Google may simple not bother to index data that it figures is going to be 
stale immediately anyhow. 

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to