On 12/31/2010 04:23 AM, Fabrizio Giudici wrote:

As far as I understand sitemap.xml is possibly only part of the solution. I mean, I can list in the sitemap two URLs such as:

http://acme.com/foo/bar/#1
http://acme.com/foo/bar/#2

(and I'm not sure BTW if the #hash is swallowed by the crawler), but in the end the crawler will always download the very same page, whose contents are dynamically adapted by JavaScript. AFAIK the Google crawler could run some JavaScript (the best of my knowledge is here: http://blogs.forbes.com/velocity/2010/06/25/google-isnt-just-reading-your-links-its-now-running-your-code/), but I doubt it would run my slideshow code, that if not stopped by the specific pressure of a button runs forever (the cited blog is all about the problem of detecting the termination of a script). I suppose I could detect that the page has been grabbed by the Google crawler, and in this case just run a very simple script that adds a text section to the HTML with the description of the photo and then stops. But how would I guess if it works? It could take days before the crawler gets my page after any change, and if it doesn't work I would be never sure about what's broken. I hope there's some more documentation that Google provides to us about this problem.

BTW, this raises another question. If I'm not wrong, Google dropped the use of the keywords that once people used to put in the <meta> section of the <head> of an HTML page. In fact, people abused the feature by putting fake keywords with a high search frequency (such as those related to sex) to "fool" the computation of the page score. So, if I'm not wrong, Google decided only to use the actual contents of the page, to provide more accurate results. Now, if JavaScript was used to index a page, one could easily detect the crawler (as I said above) not to honestly generate the proper dynamic content, but again some fake stuff to increase the score. So I suppose Google won't do that.. or will it do?

--
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
java.net/blog/fabriziogiudici - www.tidalwave.it/people
[email protected]

--
You received this message because you are subscribed to the Google Groups "The Java 
Posse" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en.

Reply via email to