Please let me try to address all the questions that came up in one shot
here. First, as I said, this is a very, very early experiment, so there
are still many technical unknowns and few design decisions totally
committed [waving hands] :-) I'm glad to see there's so much interest in
this scalability issue.
The goal of Backstage from my perspective is to support larger data
sets, compared to Exhibit, while retaining the same ease of authoring as
Exhibit. This translates to "no server-side software installation, no
database to set up/configure, only a bit of HTML, etc." The demo you see
is technological progress on this front--the conclusion to draw is that
this goal is achievable. :-)
Regarding the transition from small to large data sets, when everything
is ready, for the common case, hopefully all you have to do is append
?backstage=true to exhibit-api.js.
Regarding the relationship with Longwell and Longwell-CSI, that's
harder/too early to say. Well, each project was created for a different
purpose with different criteria, etc.
Regarding using Backstage on licensed/private data, it is possible to
install Backstage on your own server and tell Exhibit to use that
instance rather than a public Backstage service. The experiment right
now does not include that option, but there is no technical challenge
there. You actually might even want to run Backstage yourself and
connect it directly to a local Sesame store if your data is too large to
transfer as a JSON file.
David Karger suggests building Backstage into a Firefox extension...
-----
Now onto the technical details...
When backstage-demo.html is loaded onto your browser, the Javascript
code of Exhibit and Backstage gets loaded and executed.
Client-side Backstage randomizes an "interactive session ID" and
requests an interactive session with server-side Backstage at
http://dfhuynh.csail.mit.edu:8181/
through a JSONP transport. That just means that to call the server
portion of Backstage, the client portion of Backstage appends <script>s
elements into the <head> of the DOM. If you have FireBug installed, open
it up, switch to the HTML tab, expand <head> and look at the last few
<script>s. JSONP is used so that the client portion (executed within the
domain of the web page, i.e., people.csail.mit.edu) can call to the
server portion sitting on a different domain (i.e.,
dfhuynh.csail.mit.edu:8181).
The "interactive session" is different from the normal server session.
If you open two browser tabs or two browser windows pointing to 2
different backstaged exhibits, you have only 1 server session but 2
"interactive sessions". If there is no interactive session concept, your
interactions with those 2 exhibits will get mixed up. This is a
technical challenge not too often encountered in web applications.
Once the interactive session is set up, client-side Backstage walks
through the DOM (just like Exhibit alone would) and reads off the data
<link>s in the <head>. Client-side Backstage then sends those data links
to server-side Backstage, which instantiates a Sesame 2.0 rc2 memory
triple store and loads the data from those links into that triple store.
This step requires that those links are publicly accessible (e.g., you
cannot point to data at file:/// URLs).
Note that server-side Backstage uses the URL of the exhibit as well as
the set of data link URLs as the key to cache the triple store. This is
so that when several users view the same exhibit, only one triple store
is instantiated (to save space and time). There are technical challenges
here to address the case where the data at any one of those URLs is
changed after the triple store is instantiated and loaded.
Next, also through JSONP, client-side Backstage sends over to
server-side Backstage the configurations of the collections, views, and
facets, and the server-side Backstage mirrors those client-side
Javascript components with server-side Java objects.
From then on, UI interactions on the web page cause client-side
Backstage to make more JSONP calls to server-side Backstage. JSONP
results cause the client-side components to update themselves.
There is another technical challenge here: if the server session
expires, all the interactive sessions get thrown away on the server. The
triple store might even get thrown away if no other user is looking at
the same exhibit. But you might still have that exhibit shown on your
browser, and it's entirely reasonable to want to resume your interaction
with it after leaving it alone for a long time. At this point, your UI
action (e.g., clicking in a facet) will cause client-side Backstage to
call server-side Backstage, who has lost all its states about your
interactive session. Server-side Backstage returns a particular error,
which causes client-side Backstage to send over its whole state so that
server-side Backstage can reinitialize itself and pick up where it has
left off.
The JSONP protocol will be pretty specific to Backstage. If there's a
desire to load data through SPARQL query, the right place to hook in
would be between the server-side code of Backstage and the SPARQL end
point. Right now server-side Backstage formulates its queries to the
triple store by putting together Sesame "query algebra trees". If SPARQL
is as expressive as Sesame's query algebra (supporting GROUP, COUNT,
MIN, MAX), then it shouldn't be hard to swap in a SPARQL end point.
There is still one big technical unknown: lens templates. As you see in
the demo, the view only shows the items' labels. Lens templates are
tricky because they encapsulate computations on the data (think ex:if
and ex:content) but they also involve DOM constructions. The former
needs to be done on the server and the latter is better done on the client.
And that's one long email... :-)
David
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general