On 01/03/07, Ben Hyde <[EMAIL PROTECTED]> wrote: > On Mar 1, 2007, at 4:10 AM, Danny Ayers wrote: > > Am I right that the distinction your making is not > > > between scraping and parsing. > > but between client v.s. server taking the responsibility for mapping > the info into RDF?
In a sense, but possibly not in the sense you mean. I obviously wasn't clear, I am using the word "parsing" loosely - it's a bit shorter that "deterministic interpretation of publisher's intent according to standard specifications". Ok, take the document: http://www.w3.org/2001/sw/grddl-wg/td/card It's a HTML page with embedded data, using the hCard microformat. This can be determined by : <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2006/03/hcard"> The source document is conformant with both the "Meta Data Profiles" section of the HTML spec and the GRDDL spec. Together this means that it's possible to automatically extract these statements (amongst others): [ a v:Organization v:organization-name "Data Access Technologies" ] Ok, these statements could be extracted without using the GRDDL mechanism and/or without the profile URI. The specific processing mechanism isn't particularly relevant in this context, but the existence and recognition of the profile URI very much *is*. Say we know the doc was published by Joe. Because of the profile and the markup in the document body, we can state categorically that: Joe says: [ a v:Organization v:organization-name "Data Access Technologies" ] If we don't consider the profile, the best we can state is: Our scraping heuristics determined: Joe says: [ a v:Organization v:organization-name "Data Access Technologies" ] *If* the server provides the profile URI, the data is exactly equivalent to publishing it as RDF/XML (or Turtle or whatever, assuming appropriate mime types). By using the profile, the publisher has licensed that interpretation. For many purposes this won't matter, but if we are going to republish (and republish, and...) data it's important to recognise the difference. An artificial example: someone shows a sample of a liberal license in their document, the document itself being issued under a strict license. An over-eager scraper might see the sample and interpret it as the license for the document. But with a profile declaration, the ambiguity is removed. > Doing that mapping is work. It's key work in getting the RDF > bonfires burning bright. Absolutely. > A personal opinion here: Demanding that only one of the three > possible sources of labor do this work is not an effective strategy > to getting a big fire. > > I tend to think of work, like this, as wanting done. The puzzle is > where's the volunteer to turn the spade? You get a damn sight more > possible volunteers if you urge the population of clients and > intermediaries to consider it. Your also closer to the population > that will benefit so the motivation is easier. I don't disagree, although hopefully everyone will benefit... > Chicken and egg problems are rarely resolved by focusing on one or > the other. Agreed 100% Cheers, Danny. -- http://dannyayers.com _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
