Re: Data Used in Exhibit

Stefano Mazzocchi Tue, 27 Feb 2007 14:18:53 -0800

David Huynh wrote:

[snip]


> However, if your web page links to the script on our server, then our 
> server automatically logs your domain name / IP address (as a referrer). 
> This is a common behavior. The consequence of this behavior is 
> interesting--perhaps Stefano will jump in and discuss it here.

Thanks for the mike, David. :-)

We spend a lot of cycles monitoring the use of our technology, first and
foremost for personal curiosity, second for inspiration and suggestions
(and/or criticism that people don't feel comfortable telling us
directly, for whatever reason and even if we strongly welcome it and
treasure it) and third to show our funders that when we say that we want
our research is applied, we're not bluffing.

For Timeline and Exhibit, we serve the files directly from our servers
and we incur in the cost of network, power, hardware and system
administration both to reduce to a minimum the barrier of entry and use
of the software we produce and we have been surprised and very pleased
by the adoption that such decision has generated.

As a positive side effect, when a browser loads a page that is
referenced from another, it sends that information along with the
request. Our web servers are configured to save and store that
information along with the log.

Moreover, we mine those server logs (with tools like Referee and with
custom-made scripts) to generate reports that are useful for us and for
others to understand the evolution of the project.

In aggregated, they respect the privacy of the individuals contributing
to the logs and we therefore make the results public

 http://simile.mit.edu/history/

but for exhibit referrers we do more: we crawl back the referrers,
understand what views are used, what data is used, create an exhibit of
all the exhibits (called 'metaexhibit'), we also fetch the data, RDFize
it with Babel, store in a local triple store and generate an atom feed
of the new usages of exhibit.

One could say that since the exhibit users did not protect their pages
with HTTP authentication, we should not treat this page different from
any other web page. But since it might be that this data is considered
private because it's not linked to the general web, we feel that doing
so would be abusive and for that reason we do not show the metaexhibit
to the public.

Unfortunately, while some exhibits are made to be private, others are
not and we are sure that such a 'metaexhibit' would be of great use for
others for inspiration, example, curiosity or data integration.

There are several ways we thought about enabling this:

 1) ask the major search engines if they have the referring URL in their
databases. If so, it means that this page has been linked from the
public web and for that reason it is reasonably safe to assume this is
the case. The pro of this approach is that it can be fully automated and
with ease, the con is that one could link a page by mistake and then it
would end up public (but it would be in the google cache anyway).

 2) suggest people to embed machine-readable licensing information in
their pages so that we can understand what kind of activity we are
allowed to do. This is the exhibit-equivalent of a robots.txt file for
web spiders but also has the advantage of adding licensing information
to the data, so that mixing could be done legally.

 3) let people that want to show their exhibits write a list on our wiki
and write some script that automatically extract that data and generate
exhibits out of it. our use of semantic mediawiki helps a lot in this
regard.

Note how the three approaches are not mutually exclusive, and some are
more privacy-protecting than others.

We have already started to implement #3 by adding pages and plumbing on
the wiki and #1 could be added easily to the existing metaexhibit as a
new facet to provide hints to us for contact the exhibit authors for
permission to include in our wiki list.

#2 is the more complex and requires us to agree on modeling the
licensing information in machine-processable form. Of course, Creative
Commons comes to mind, but that doesn't contain the notion of 'private
use', so at least it should be extended.

Comments?

-- 
Stefano Mazzocchi
Digital Libraries Research Group                 Research Scientist
Massachusetts Institute of Technology
E25-131, 77 Massachusetts Ave               skype: stefanomazzocchi
Cambridge, MA  02139-4307, USA         email: stefanom at mit . edu
-------------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Data Used in Exhibit

Reply via email to