Re: scaling up Exhibit - an early experiment

David Huynh Thu, 07 Feb 2008 16:48:39 -0800

Please let me try to address all the questions that came up in one shot 
here. First, as I said, this is a very, very early experiment, so there 
are still many technical unknowns and few design decisions totally 
committed [waving hands] :-) I'm glad to see there's so much interest in 
this scalability issue.


The goal of Backstage from my perspective is to support larger data 
sets, compared to Exhibit, while retaining the same ease of authoring as 
Exhibit. This translates to "no server-side software installation, no 
database to set up/configure, only a bit of HTML, etc." The demo you see 
is technological progress on this front--the conclusion to draw is that 
this goal is achievable. :-)

Regarding the transition from small to large data sets, when everything 
is ready, for the common case, hopefully all you have to do is append 
?backstage=true to exhibit-api.js.

Regarding the relationship with Longwell and Longwell-CSI, that's 
harder/too early to say. Well, each project was created for a different 
purpose with different criteria, etc.

Regarding using Backstage on licensed/private data, it is possible to 
install Backstage on your own server and tell Exhibit to use that 
instance rather than a public Backstage service. The experiment right 
now does not include that option, but there is no technical challenge 
there. You actually might even want to run Backstage yourself and 
connect it directly to a local Sesame store if your data is too large to 
transfer as a JSON file.

David Karger suggests building Backstage into a Firefox extension...

-----

Now onto the technical details...

When backstage-demo.html is loaded onto your browser, the Javascript 
code of Exhibit and Backstage gets loaded and executed.

Client-side Backstage randomizes an "interactive session ID" and 
requests an interactive session with server-side Backstage at
    http://dfhuynh.csail.mit.edu:8181/
through a JSONP transport. That just means that to call the server 
portion of Backstage, the client portion of Backstage appends <script>s 
elements into the <head> of the DOM. If you have FireBug installed, open 
it up, switch to the HTML tab, expand <head> and look at the last few 
<script>s. JSONP is used so that the client portion (executed within the 
domain of the web page, i.e., people.csail.mit.edu) can call to the 
server portion sitting on a different domain (i.e., 
dfhuynh.csail.mit.edu:8181).

The "interactive session" is different from the normal server session. 
If you open two browser tabs or two browser windows pointing to 2 
different backstaged exhibits, you have only 1 server session but 2 
"interactive sessions". If there is no interactive session concept, your 
interactions with those 2 exhibits will get mixed up. This is a 
technical challenge not too often encountered in web applications.

Once the interactive session is set up, client-side Backstage walks 
through the DOM (just like Exhibit alone would) and reads off the data 
<link>s in the <head>. Client-side Backstage then sends those data links 
to server-side Backstage, which instantiates a Sesame 2.0 rc2 memory 
triple store and loads the data from those links into that triple store. 
This step requires that those links are publicly accessible (e.g., you 
cannot point to data at file:/// URLs).

Note that server-side Backstage uses the URL of the exhibit as well as 
the set of data link URLs as the key to cache the triple store. This is 
so that when several users view the same exhibit, only one triple store 
is instantiated (to save space and time). There are technical challenges 
here to address the case where the data at any one of those URLs is 
changed after the triple store is instantiated and loaded.

Next, also through JSONP, client-side Backstage sends over to 
server-side Backstage the configurations of the collections, views, and 
facets, and the server-side Backstage mirrors those client-side 
Javascript components with server-side Java objects.

 From then on, UI interactions on the web page cause client-side 
Backstage to make more JSONP calls to server-side Backstage. JSONP 
results cause the client-side components to update themselves.

There is another technical challenge here: if the server session 
expires, all the interactive sessions get thrown away on the server. The 
triple store might even get thrown away if no other user is looking at 
the same exhibit. But you might still have that exhibit shown on your 
browser, and it's entirely reasonable to want to resume your interaction 
with it after leaving it alone for a long time. At this point, your UI 
action (e.g., clicking in a facet) will cause client-side Backstage to 
call server-side Backstage, who has lost all its states about your 
interactive session. Server-side Backstage returns a particular error, 
which causes client-side Backstage to send over its whole state so that 
server-side Backstage can reinitialize itself and pick up where it has 
left off.

The JSONP protocol will be pretty specific to Backstage. If there's a 
desire to load data through SPARQL query, the right place to hook in 
would be between the server-side code of Backstage and the SPARQL end 
point. Right now server-side Backstage formulates its queries to the 
triple store by putting together Sesame "query algebra trees". If SPARQL 
is as expressive as Sesame's query algebra (supporting GROUP, COUNT, 
MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point.

There is still one big technical unknown: lens templates. As you see in 
the demo, the view only shows the items' labels. Lens templates are 
tricky because they encapsulate computations on the data (think ex:if 
and ex:content) but they also involve DOM constructions. The former 
needs to be done on the server and the latter is better done on the client.

And that's one long email... :-)

David

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: scaling up Exhibit - an early experiment

Reply via email to