Re: Exhibits invisible to Google

David Karger Wed, 31 Jan 2007 20:42:37 -0800

I would posit the following pessimistic principle: that the only way a 
google search for exhibit data will point back to the html page wrapping 
the exhibit is if the exhibit data is actually in the html page.  I 
think the question to focus on is, what is the easiest way to get 
exhibit data into the html page.  There are some actual advantages to 
this approach---people without javascript can still get a look at the 
data.  Options we've discussed include:

embedding the json file inside <PRE> tags
embedding the json data inside an html table
running the exhibit, and saving a snapshot of the resulting page

I would of course love to see all 3 supported.  The first is probably 
easiest from standpoint of the exhibit author, but it does require some 
thought about what sort of escaping of special html characters will be 
needed in the json---this will have the annoying feature of making the 
embedded data not be legal json if you copy it out and paste it 
somewhere else. 

The second will also require escaping of html characters but that will 
seem normal since it is real html; on the downside manually editing an 
html table, especially if you want to do bulk manipulations, will be 
very painful. 

The third option will certainly produce the "prettiest" snapshot, and 
allows the data to continue to live, unescaped, in a separate json 
file.  It will make it a little unpleasant to edit the html, since there 
will be this big blob of data hanging off the page, but if we put that 
data on the very bottom, it can mainly be ignored.  The funny thing is 
that whatever data is there is actually irrelevant to anyone visiting 
the exhibit, since the first thing the exhibit will do is zap that data 
and replace it with the data incorporated from the exhibit.
Note that this snapshotting _almost_ works now---if you visit an exhibit 
and choose file-save, you do get a copy of the page with all the 
interesting data embedded.  The problem is that if you load that page, 
the exhibit doesn't show---the page has changed in a way that prevents 
the exhibit from "restarting" and loading its data.  I think it will 
involve not much work to modify exhibit to be "idempotent"---resetting 
itself if you load a saved one.

Some day, when exhibit takes over the world, google will spider exhibits 
properly.  But we need a way to take over the world first and I think 
embedded exhibits are the way to do that.

-D

David Huynh wrote:
> David Huynh wrote:
>   
>> Johan Sundström wrote:
>>   
>>     
>>> On 1/25/07, David Huynh <[EMAIL PROTECTED]> wrote:
>>>   
>>>     
>>>       
>>>> Oh that might work! So, maybe something like this?
>>>>
>>>>   <head>
>>>>     <link rel="exhibit/data" type="application/json"
>>>>       href="my-data.json" />
>>>>
>>>>     <link rel="exhibit/google-spreadsheets-data" type="application/jsonp"
>>>>       
>>>> href="http://spreadsheets.google.com/feeds/list/o08841867754116283182.6102151849127695926/od6/public/basic?alt=json-in-javscript";
>>>>  />
>>>>
>>>>     <!-- Just for you, Google! -->
>>>>     <link rel="alternate" type="application/rss+xml" title="RSS 2.0"
>>>>       
>>>> href="http://www.foo.com/convert-exhibit-json-to-rss?url=http://people.csail.mit.edu/dfhuynh/my-data.json";
>>>>  />
>>>>
>>>>     <link rel="alternate" type="application/rss+xml" title="RSS 2.0"
>>>>       
>>>> href="http://spreadsheets.google.com/feeds/list/o08841867754116283182.6102151849127695926/od6/public/basic";
>>>>  />
>>>>   </head>
>>>>
>>>> How confident are we that this will work?
>>>>       
>>>>         
>>> At the very least, it is worth a shot and doing some field testing.  
>>>     
>>>       
>> OK, so Babel has been extended to support converting from Exhibit JSON 
>> files and Exhibit-embedding web pages to RSS feeds. The conversion is 
>> pretty dumb for now.
>>
>> My own web site now has these feeds. The publications page points to a 
>> .rss file converted through Babel and saved statically on my own site, 
>> while the other pages (projects, books, quotations) point through Babel 
>> for on-the-fly conversion.
>>
>> I guess I'll sit and wait for Google to crawl :-)
>>   
>>     
> The Great Google Crawler came, passed by, ... and didn't see the 
> feeds... :-(
>
> Plan B?!
>
> David
>
> _______________________________________________
> General mailing list
> [email protected]
> http://simile.mit.edu/mailman/listinfo/general
>   
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Exhibits invisible to Google

Reply via email to