Re: SemWeb and Google

glenn mcdonald Sat, 03 Feb 2007 20:18:56 -0800

I don't think this really even counts as irony. The difference in  
scalability between indexing raw HTTP response-strings, as they do  
now, and running virtual browsers to execute arbitrary client-side  
code on any page, as they'd have to do get the Exhibit-rendered  
contents, is at this stage of web development pretty much  
prohibitive. In the long term, it's a great idea to push for a  
standard approach to providing a structured dataset alongside the  
human-readable page and the change-trackable RSS/Atom feed (or maybe  
eventually in place of the latter...).


In the short term, though, you're going to have to feed Google data  
it can understand. And I think any approach that involves trying to  
somehow get the Exhibit-rendered HTML saved into a static form and  
manually retrofitted back into the source page is going to be too  
hard for most of the users you *want*, and probably too annoying even  
where it's not too hard. You'd have to *redo* it every time you  
change the data! Yuck.

So I think you have to go in the other direction.

1. Let people keep making their "default" pages exactly as they're  
making them now.
2. Get 'em to use Piggy Bank to scrape their *existing* pages into  
RDF/JSON.
3. Have Exhibit run off of the scraper results, either by giving JSON  
files back to the user or, even better, hosting them on an Exhibit  
server that can also monitor the source pages (or their feeds) and  
automatically rescrape and update the JSON files when the user  
changes the source data.

What about *that* idea?

glenn

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: SemWeb and Google

Reply via email to