David Huynh wrote: [snip]
> However, if your web page links to the script on our server, then our > server automatically logs your domain name / IP address (as a referrer). > This is a common behavior. The consequence of this behavior is > interesting--perhaps Stefano will jump in and discuss it here. Thanks for the mike, David. :-) We spend a lot of cycles monitoring the use of our technology, first and foremost for personal curiosity, second for inspiration and suggestions (and/or criticism that people don't feel comfortable telling us directly, for whatever reason and even if we strongly welcome it and treasure it) and third to show our funders that when we say that we want our research is applied, we're not bluffing. For Timeline and Exhibit, we serve the files directly from our servers and we incur in the cost of network, power, hardware and system administration both to reduce to a minimum the barrier of entry and use of the software we produce and we have been surprised and very pleased by the adoption that such decision has generated. As a positive side effect, when a browser loads a page that is referenced from another, it sends that information along with the request. Our web servers are configured to save and store that information along with the log. Moreover, we mine those server logs (with tools like Referee and with custom-made scripts) to generate reports that are useful for us and for others to understand the evolution of the project. In aggregated, they respect the privacy of the individuals contributing to the logs and we therefore make the results public http://simile.mit.edu/history/ but for exhibit referrers we do more: we crawl back the referrers, understand what views are used, what data is used, create an exhibit of all the exhibits (called 'metaexhibit'), we also fetch the data, RDFize it with Babel, store in a local triple store and generate an atom feed of the new usages of exhibit. One could say that since the exhibit users did not protect their pages with HTTP authentication, we should not treat this page different from any other web page. But since it might be that this data is considered private because it's not linked to the general web, we feel that doing so would be abusive and for that reason we do not show the metaexhibit to the public. Unfortunately, while some exhibits are made to be private, others are not and we are sure that such a 'metaexhibit' would be of great use for others for inspiration, example, curiosity or data integration. There are several ways we thought about enabling this: 1) ask the major search engines if they have the referring URL in their databases. If so, it means that this page has been linked from the public web and for that reason it is reasonably safe to assume this is the case. The pro of this approach is that it can be fully automated and with ease, the con is that one could link a page by mistake and then it would end up public (but it would be in the google cache anyway). 2) suggest people to embed machine-readable licensing information in their pages so that we can understand what kind of activity we are allowed to do. This is the exhibit-equivalent of a robots.txt file for web spiders but also has the advantage of adding licensing information to the data, so that mixing could be done legally. 3) let people that want to show their exhibits write a list on our wiki and write some script that automatically extract that data and generate exhibits out of it. our use of semantic mediawiki helps a lot in this regard. Note how the three approaches are not mutually exclusive, and some are more privacy-protecting than others. We have already started to implement #3 by adding pages and plumbing on the wiki and #1 could be added easily to the existing metaexhibit as a new facet to provide hints to us for contact the exhibit authors for permission to include in our wiki list. #2 is the more complex and requires us to agree on modeling the licensing information in machine-processable form. Of course, Creative Commons comes to mind, but that doesn't contain the notion of 'private use', so at least it should be extended. Comments? -- Stefano Mazzocchi Digital Libraries Research Group Research Scientist Massachusetts Institute of Technology E25-131, 77 Massachusetts Ave skype: stefanomazzocchi Cambridge, MA 02139-4307, USA email: stefanom at mit . edu ------------------------------------------------------------------- _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
