Some additional observations: With regard to google searches, having the data "out of page" is pretty much a non-starter. Even if we can convince google to index the data page, the result will be searchers directed to the data page, which is clearly not the goal. The only way around that would be to serve different things to spiders than to humans, which is exactly what spammers do and likely to get exhibits that do it banned from google.
So, the only real option I see is for the data to be in the page. There are 3 obvious options here. Two are in-page embeddings by the editor, as json or html tables. These have several downsides. First, for this to be truly legitimate html it would have to have certain characters escaped. This would make it harder for someone to edit. Also, it prevents someone from being organized and linking to multiple distinct json files. Finally, it doesn't work for data in a google spreadsheet or other external sources. It does have the distinct advantage of making small exhibits even more portable---now there are only half as many files to copy and save :) So, the alternative I prefer is getting a "save as" feature to work properly. After exhibit renders the page, you have a document that contains all the data nicely formatted as html. Thus, it is completely aboveboard as a page for google to index. Also, it doesn't matter where the data comes from---internal, files, google spreadsheets---it all ends up in page ready to be indexed. Of course, if someone visits the page, the preembedded data on the page is immediately discarded---it plays no role in the construction of the exhibit. However, what exhibit constructs will be almost the same page, since it is filling in the same data. So, I feel no qualms about google blacklisting for misbehavior. All of these options do have another advantage. They make the exhibit accessible to someone with js disabled. Again, the last (saving post-exhibited html) makes the nicest presentation. -David David Huynh wrote: > Hi all, > > Exhibit suffers from the same Achilles heel as other Ajax applications: > the dynamic content that gets inserted on-the-fly is totally invisible > to Google. My whole web site is now invisible to Google :-) Perhaps this > is the biggest impediment to adoption. > > Johan has added some code that allows Exhibit to load data from HTML > tables. This lets your data be shown even if Javascript is disabled and > lets your data be visible to Google. However, HTML tables are clunky to > store data. > > There is another alternative: inserting your data encoded as JSON > between <pre>...</pre> and then getting Exhibit to grab that text out > and eval(...) it. If Javascript is disabled, the data is displayed as > JSON--not so pretty. > > However, if the data is fed from another source, such as Google > Spreadsheets, then neither of these approaches can be used. > > We've also entertained the idea of using the browser's Save Page As... > feature to snapshot a rendered exhibit and then using that as the public > page. Exhibit still gets loaded into that page, but it would initially > not change the DOM until some user action requires it to. However, the > browser's Save Page As... feature doesn't do a very good job of saving > the generated DOM. > > So, I think anything we do would look pretty much like a hack and work > for only some cases. We also risk getting blacklisted by Google's > crawler. So, what do we do? Is it possible to ask Google to scrape those > exhibit-data links in the heads of the pages? And how do we do that? > > David > > _______________________________________________ > General mailing list > [email protected] > http://simile.mit.edu/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
