On 12/31/2010 04:23 AM, Fabrizio Giudici wrote:
As far as I understand sitemap.xml is possibly only part of the
solution. I mean, I can list in the sitemap two URLs such as:
http://acme.com/foo/bar/#1
http://acme.com/foo/bar/#2
(and I'm not sure BTW if the #hash is swallowed by the crawler), but
in the end the crawler will always download the very same page, whose
contents are dynamically adapted by JavaScript. AFAIK the Google
crawler could run some JavaScript (the best of my knowledge is here:
http://blogs.forbes.com/velocity/2010/06/25/google-isnt-just-reading-your-links-its-now-running-your-code/),
but I doubt it would run my slideshow code, that if not stopped by the
specific pressure of a button runs forever (the cited blog is all
about the problem of detecting the termination of a script). I suppose
I could detect that the page has been grabbed by the Google crawler,
and in this case just run a very simple script that adds a text
section to the HTML with the description of the photo and then stops.
But how would I guess if it works? It could take days before the
crawler gets my page after any change, and if it doesn't work I would
be never sure about what's broken. I hope there's some more
documentation that Google provides to us about this problem.
BTW, this raises another question. If I'm not wrong, Google dropped the
use of the keywords that once people used to put in the <meta> section
of the <head> of an HTML page. In fact, people abused the feature by
putting fake keywords with a high search frequency (such as those
related to sex) to "fool" the computation of the page score. So, if I'm
not wrong, Google decided only to use the actual contents of the page,
to provide more accurate results. Now, if JavaScript was used to index a
page, one could easily detect the crawler (as I said above) not to
honestly generate the proper dynamic content, but again some fake stuff
to increase the score. So I suppose Google won't do that.. or will it do?
--
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
[email protected]
--
You received this message because you are subscribed to the Google Groups "The Java
Posse" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/javaposse?hl=en.