Lofesa commented on issue #1622: Disable auto-adding PageSpeed query parameter 
to location header
URL: 
https://github.com/apache/incubator-pagespeed-mod/issues/1622#issuecomment-540671117
 
 
   Hi @zevnikrok 
   About the link rel="canonical"... these link say to google what the 
"original" page is. Think about a page, say a list of products and their 
prices. The page have a url `https://mydomain.com/products` but this page have 
a parameter to show the product list ordered by price 
`https://mydomain.com/products/?order=price`, now google bot see 2 url with the 
same content so penalize this page with a duplicate content issue. If you put, 
in the html code, a link rel canonical ( `<link rel="canonical" 
href="https://mydomain.com/products"; />` ) in the page, then google only 
indexes the "canonical" page ( `https://mydomain.com/products` ) and no 
duplicate conten issue.
   I can see you don´t have this in your page.
   
   Well, now you have figured out from where the link with 
`?PageSpeed=noscript` comes and have set the `pagespeed SupportNoScriptEnabled 
false;` to disable it, but google had these url and try again and again to 
fecht it.... pagespeed module can´t solve this, has nothing to do on how google 
do their work to store in cache/ indexing url´s....
   But you have options:
   1.- In the new Search Console, at the left panel you have an option "Legacy 
tools and reports", in these option "Removals". With this tool, you can hide 
the url in search result for a 6 moth time lapse.
   2.- As far as you can´t set a robots noindex metatag in the page cause this 
"deletes" the page from the index and you don´t will that, you need to set 
nginx to return a 404 - 410 error when a url have the `?PageSpeed=noscript` 
parameter, these error tell to google that the page is gone so then it deletes 
the url from their index and cache. 410 is better that 404, cause 410 tell to 
google that the page is gone consciously, so take less time to delete it. Is 
not a fast process, take some time that google bot try to fecht the url get the 
404-410 error and deletes de url from index/cache.
   In the server block of the nginx conf you can set some like this:
   `if ($args ~* (.*noscript.*){ return 410;}`
   and the wait....
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to