Re: Exhibits invisible to Google

Johan Sundström Thu, 01 Feb 2007 09:00:02 -0800

On 2/1/07, David Karger <[EMAIL PROTECTED]> wrote:
> I think the question to focus on is, what is the easiest way to get
> exhibit data into the html page.  There are some actual advantages
> to this approach---people without javascript can still get a look at
> the data.  Options we've discussed include:
>
> embedding the json file inside <PRE> tags
> embedding the json data inside an html table
> running the exhibit, and saving a snapshot of the resulting page


I actually have a variant on 2) in mind, which I find even prettier,
but have not yet quite had the time to play much with. (My Exhibit
hacking has been down a few notches recently.) It might not be quite
for everyone, though, or at the very least require some rather
ingenious tooling to become:

4) Layout the information however you would any kind of
template-structured HTML, in some appetizing, appealing way, for
Google and other non-interactive, or non-javascript-capable, browsers.
Then feed Exhibit a short web scraping recipe for how to process the
template, carving out all the data held by it, implied by structure,
markup and text content.

This is very close to the experiments with injecting Exhibit pan-web
via Greasemonkey that I mentioned some time ago (the visualization
layer is not in a working state, the way it was then, but I've worked
a bit at the scrape recipe serialization end since). Again picking the
same example not made for Exhibit,

   http://pike.ida.liu.se/development/pikefarm/7.7.xml

and a scrape template full of relative XPath references, again
harvesting just the status of the most recent build:

{ scrape: {
  root:"//[EMAIL PROTECTED]"xf\"]",
  rows:"tbody/tr",
  fields:[
    {name:"label", path:"td[last()]/text()"},
    {name:"host", path:"td/text()"},
    {name:"status", path:"td[last()-1]/a/img/@alt"}
  ]}
}

...initialize Exhbit with the data set scraped from the page. For now
that code resides in my user script rather than Exhibit, and is
probably only functional in web standards conformant browsers, though.

The particulars of this scrape method do not cover every conceivable
case of markup, but a very large part of what common template engines
spit out today, and it makes moving from whatever you do today to
having it in Exhibit close to no work at all, assuming you already
have a presentation form (and the competence or a tool to make a
scrape template, as above).

As I proceed getting somewhere with this, it will most likely grow
even more dynamic capabilities to cater for cases needing code to
devise data implied by page structure, perhaps like this choir
practice material (mp3 links unfortunately not freely distributable
and thus filtered) page I crafted some years ago,

  http://ecmanaut.googlepages.com/choir-practice-material.html

With some quality mind work I'm hoping it might even be possible to
craft a lens following the original markup layout off something like
that, without having to give Exhibit all too many cues (as today), but
that is even more blue sky at the moment.

-- 
 / Johan Sundström, http://ecmanaut.blogspot.com/

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Exhibits invisible to Google

Reply via email to