On Sunday, October 28, 2012 8:50:27 AM UTC-7, Joseph Lust wrote:
> I see you're using places and the url tokens, are you using GWTP? There is
> some built in crawler support there that you could use, or at least
> investigate if you're not using GWTP.
>
> http://code.google.com/p/gwt-platform/wiki/CrawlerSupport
>
> SIncerely,
> Joseph
>
Thanks for responding!
Yes you are correct. I am using places and activities (but not GWTP). I use
straight GWT throughout the entire app (including all the editor +
validation stuff).
GWTP does include a canned filter for handling crawlability. I have a very
similar one albeit slightly optimized.
/**
* Special filter that adds support for Google crawling as outlined here
* ({@link
https://developers.google.com/webmasters/ajax-crawling/docs/getting-started}
*
* @author Benjamin Possolo
*/
public class GoogleCrawlerFilter implements Filter {
private static final Logger log =
Logger.getLogger(GoogleCrawlerFilter.class.getName());
private static final ThreadLocal<WebClient> webClient = new
ThreadLocal<WebClient>(){
@Override
protected WebClient initialValue() {
log.info("Instantiating headless browser");
WebClient wc = new WebClient(BrowserVersion.FIREFOX_3_6);
wc.setThrowExceptionOnScriptError(false);
wc.setThrowExceptionOnFailingStatusCode(false);
wc.setCssEnabled(false);
return wc;
};
};
@Override
public void init(FilterConfig config) throws ServletException {}
@Override
public void doFilter(ServletRequest request, ServletResponse response,
FilterChain chain)
throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest)request;
HttpServletResponse resp = (HttpServletResponse)response;
String queryString = req.getQueryString();
if( queryString != null && queryString.contains("_escaped_fragment_") ){
log.info("Detected request from Google Crawler");
//google requests the URL with the place fragment as a query parameter.
//they do this because URL fragments (the portion after the hash #) are
//not sent with an HTTP request.
//convert the ugly URL to the real url that uses the hashbang
queryString = queryString.replaceFirst("&?_escaped_fragment_=", "#!");
queryString = URLDecoder.decode(queryString, "UTF-8");
StringBuilder pageToCrawlSb = new
StringBuilder(req.getScheme()).append("://").append(req.getServerName());
if( req.getServerPort() > 0 )
pageToCrawlSb.append(':').append(req.getServerPort());
pageToCrawlSb.append(req.getRequestURI());
if( ! queryString.startsWith("#!") )
pageToCrawlSb.append('?');
pageToCrawlSb.append(queryString);
String pageToCrawl = pageToCrawlSb.toString();
log.log(Level.INFO, "Page being crawled: {0}", pageToCrawl);
//check if a snapshot of the requested page already exists
String htmlSnapshot = MemcacheUtil.getHtmlSnapshot(pageToCrawl);
if( htmlSnapshot == null ){
try{
//use HtmlUnit to render the requested page
long start = System.currentTimeMillis();
log.info("Using headless browser to fetch page");
HtmlPage page = webClient.get().getPage(pageToCrawl);
log.info("Pumping javascript event loop for 8 seconds");
webClient.get().getJavaScriptEngine().pumpEventLoop(8000); //execute
javascript for 8 seconds
long end = System.currentTimeMillis();
log.log(Level.INFO, "Time to generate page snapshot: {0} seconds", ((end -
start) / 1000L));
//we add a special message to the top of the page so that anyone seeing
the snapshot will
//know it is meant for Google crawling
String snapshotMsg = new StringBuilder("<body>\n\n")
.append("<hr />\n")
.append("<center>\n")
.append(" <h3>\n")
.append(" You are viewing a non-interactive page that is intended for
the crawler.<br/>\n")
.append(" You probably want to see this page: <a href=\"" + pageToCrawl
+ "\">" + pageToCrawl + "</a>\n")
.append(" </h3>\n")
.append("</center>\n")
.append("<hr />\n")
.toString();
htmlSnapshot = page.asXml();
htmlSnapshot = htmlSnapshot.replaceFirst("<body[^>]*>", snapshotMsg);
//store the rendered page in memcache
MemcacheUtil.putHtmlSnapshot(pageToCrawl, htmlSnapshot);
}
finally{
webClient.get().closeAllWindows();
}
}
//send the html snapshot back to the crawler
resp.setContentType("text/html; charset=UTF-8");
PrintWriter writer = resp.getWriter();
writer.print(htmlSnapshot);
}
else{
chain.doFilter(request, response);
}
}
@Override
public void destroy() {
//this is never called on Google App Engine
}
}
--
You received this message because you are subscribed to the Google Groups
"Google Web Toolkit" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/google-web-toolkit/-/00Zmn6IfmrgJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-web-toolkit?hl=en.