Re: SemWeb and Google

David Karger Tue, 06 Feb 2007 05:35:53 -0800

Glenn, I agree with you that it will be a long time before we can expect 
google to look at anything besides the html document.


However, your server model has the drawback that it requires a server.  
While this is something we want to _allow_ in the future to improve 
exhibits scalability, _requiring_ it would eliminate one of the major 
rapid-deployment benefits of exhibit.

While "save the exhibit-rendered html" is indeed a hassle (which some 
authors may tolerate in order to make a nice page for readers without 
javascript), I don't think the same applies to some of the other 
proposed solutions, such as "embed the exhibit json in the main html 
document".  That last is unpleasant to those of us who like nice modular 
systems (but such people may be willing to save exhibit-rendered html), 
but may be just fine for simple quick-and-dirty exhibitors.



glenn mcdonald wrote:
> I don't think this really even counts as irony. The difference in  
> scalability between indexing raw HTTP response-strings, as they do  
> now, and running virtual browsers to execute arbitrary client-side  
> code on any page, as they'd have to do get the Exhibit-rendered  
> contents, is at this stage of web development pretty much  
> prohibitive. In the long term, it's a great idea to push for a  
> standard approach to providing a structured dataset alongside the  
> human-readable page and the change-trackable RSS/Atom feed (or maybe  
> eventually in place of the latter...).
>
> In the short term, though, you're going to have to feed Google data  
> it can understand. And I think any approach that involves trying to  
> somehow get the Exhibit-rendered HTML saved into a static form and  
> manually retrofitted back into the source page is going to be too  
> hard for most of the users you *want*, and probably too annoying even  
> where it's not too hard. You'd have to *redo* it every time you  
> change the data! Yuck.
>
> So I think you have to go in the other direction.
>
> 1. Let people keep making their "default" pages exactly as they're  
> making them now.
> 2. Get 'em to use Piggy Bank to scrape their *existing* pages into  
> RDF/JSON.
> 3. Have Exhibit run off of the scraper results, either by giving JSON  
> files back to the user or, even better, hosting them on an Exhibit  
> server that can also monitor the source pages (or their feeds) and  
> automatically rescrape and update the JSON files when the user  
> changes the source data.
>
> What about *that* idea?
>
> glenn
>
> _______________________________________________
> General mailing list
> [email protected]
> http://simile.mit.edu/mailman/listinfo/general
>   
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: SemWeb and Google

Reply via email to