Re: Data Used in Exhibit

Johan Sundström Tue, 27 Feb 2007 17:55:07 -0800

On 2/27/07, Stefano Mazzocchi <[EMAIL PROTECTED]> wrote:
> Moreover, we mine those server logs (with tools like Referee and with
> custom-made scripts) to generate reports that are useful for us and for
> others to understand the evolution of the project.
>
> In aggregated, they respect the privacy of the individuals contributing
> to the logs and we therefore make the results public
>
>  http://simile.mit.edu/history/
>
> but for exhibit referrers we do more: we crawl back the referrers,
> understand what views are used, what data is used, create an exhibit of
> all the exhibits (called 'metaexhibit'), we also fetch the data, RDFize
> it with Babel, store in a local triple store and generate an atom feed
> of the new usages of exhibit.


Might be obvious, but "data" above means the data points mentioned.
Your exhibit *content* stays with you and your visitors.

> One could say that since the exhibit users did not protect their pages
> with HTTP authentication, we should not treat this page different from
> any other web page. But since it might be that this data is considered
> private because it's not linked to the general web, we feel that doing
> so would be abusive and for that reason we do not show the metaexhibit
> to the public.

Agreed. Some hosting facilities don't even support setting up HTTP auth.

> Unfortunately, while some exhibits are made to be private, others are
> not and we are sure that such a 'metaexhibit' would be of great use for
> others for inspiration, example, curiosity or data integration.
>
> There are several ways we thought about enabling this:
>
>  1) ask the major search engines if they have the referring URL in their
> databases. If so, it means that this page has been linked from the
> public web and for that reason it is reasonably safe to assume this is
> the case. The pro of this approach is that it can be fully automated and
> with ease, the con is that one could link a page by mistake and then it
> would end up public (but it would be in the google cache anyway).

-1. Spidering urls people ask about that you do not have in your
database would be one of the first things I did as a search engine
host; if we do, we leak data that was dark web matter until we did.

>  2) suggest people to embed machine-readable licensing information in
> their pages so that we can understand what kind of activity we are
> allowed to do. This is the exhibit-equivalent of a robots.txt file for
> web spiders but also has the advantage of adding licensing information
> to the data, so that mixing could be done legally.

+1. This is my favourite idea, and one I'd make use of for my own
exhibits, once we have it.

The way I see it, an exhibit can be shared and reused by others, in at
least two ways: data and presentation. The two are typically not
related much, at least when I create Exhibits. I mostly work with data
sets that are not my own, but I would usually gladly share my Exhibit
page template with someone else that wanted to borrow ideas, code,
layout or inspiration from how I made it the way it looks and works. I
occasionally use graphics shared under a different license, making me
unable to place that too in the public domain.

It might sound a bit hairy, but I would find it useful if we devised a
method for tagging those three properties with license name, and
perhaps a single tag for the basic case where all are the same. To me
it makes most sense sticking those tags in the URL we load Exhibit
with, for instance:

  <script 
src="http://simile.mit.edu/exhibit/api/exhibit-api.js?license=data:proprietary,layout=bsd,gfx=cc-by-nc/2.0";></script>

for my exhibit that visualizes the live readings of 192 outdoors
temperature measuring stations spread across Sweden, data c/o
www.temperatur.nu, presentation by me (based on the Simile Presidents
layout, which I hope is BSD or public domain?), background photo by
Weston Renoud under the CC attribution-noncommercial license,

  http://creativecommons.org/licenses/by-nc/2.0/

The exhibit itself is available here (and demonstrates our lack of a
numeric range facet ;-):

  http://exhibit.ecmanaut.googlepages.com/temperatur.nu.html

A perhaps more typical case might be "license=pd" for a completely
free-for-all exhibit.

Given all of the above, we could easily set up an automated exhibit of
exhibits available for reuse and inspiration, under terms you are
comfortable with. I'd love that, and am certain that it would boost
Exhibit adoption / spread most significantly. Copying and modifying
the code of others is a lot easier than reading partial docs in a
wiki, and while the example crop on the Simile site is a good start,
it is very small and they don't say much about how you are allowed to
reuse them.

>  3) let people that want to show their exhibits write a list on our wiki
> and write some script that automatically extract that data and generate
> exhibits out of it. our use of semantic mediawiki helps a lot in this
> regard.

+0; won't hurt, but too much work (for exhibit authors) to gather much
momentum, by my guess. (I haven't added any of mine, anyway.)

> #2 is the more complex and requires us to agree on modeling the
> licensing information in machine-processable form. Of course, Creative
> Commons comes to mind, but that doesn't contain the notion of 'private
> use', so at least it should be extended.
>
> Comments?

A set of human writable license shorthands and a table somewhere over
what they mean (above suggested exhibit would be a good place to find
and learn more about them) is my preference. I'm sure "proprietary"
might not be a very good word, but I'd hate to see longwinded urls go
there, unless made optional.
"data=proprietary;http://www.temperatur.nu/temperatur-1-99_1.html";
might be another legal syntax, I guess, or data=proprietary(url), to
get it nicely contained.

Tossing up ideas,

-- 
 / Johan Sundström, http://ecmanaut.blogspot.com/

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Re: Data Used in Exhibit

Reply via email to