On 01/03/07, Ben Hyde <[EMAIL PROTECTED]> wrote:
> On Mar 1, 2007, at 4:10 AM, Danny Ayers wrote:
>
> Am I right that the distinction your making is not
>
> > between scraping and parsing.
>
> but between client v.s. server taking the responsibility for mapping
> the info into RDF?

In a sense, but possibly not in the sense you mean. I obviously wasn't
clear, I am using the word "parsing" loosely - it's a bit shorter that
"deterministic interpretation of publisher's intent according to
standard specifications".

Ok, take the document:

http://www.w3.org/2001/sw/grddl-wg/td/card

It's a HTML page with embedded data, using the hCard microformat. This
can be determined by :

<html xmlns="http://www.w3.org/1999/xhtml";>
   <head profile="http://www.w3.org/2006/03/hcard";>

The source document is conformant with both the "Meta Data Profiles"
section of the HTML spec and the GRDDL spec. Together this means that
it's possible to automatically extract these statements (amongst
others):

[    a v:Organization
        v:organization-name "Data Access Technologies" ]

Ok, these statements could be extracted without using the GRDDL
mechanism and/or without the profile URI. The specific processing
mechanism isn't particularly relevant in this context, but the
existence and recognition of the profile URI very much *is*.

Say we know the doc was published by Joe. Because of the profile and
the markup in the document body, we can state categorically that:

Joe says:
   [    a v:Organization
           v:organization-name "Data Access Technologies" ]

If we don't consider the profile, the best we can state is:

Our scraping heuristics determined:
      Joe says:
         [    a v:Organization
                 v:organization-name "Data Access Technologies" ]

*If* the server provides the profile URI, the data is exactly
equivalent to publishing it as RDF/XML (or Turtle or whatever,
assuming appropriate mime types). By using the profile, the publisher
has licensed that interpretation.

For many purposes this won't matter, but if we are going to republish
(and republish, and...) data it's important to recognise the
difference. An artificial example: someone shows a sample of a liberal
license in their document, the document itself being issued under a
strict license. An over-eager scraper might see the sample and
interpret it as the license for the document. But with a profile
declaration, the ambiguity is removed.

> Doing that mapping is work.  It's key work in getting the RDF
> bonfires burning bright.

Absolutely.

> A personal opinion here: Demanding that only one of the three
> possible sources of labor do this work is not an effective strategy
> to getting a big fire.
>
> I tend to think of work, like this, as wanting done.  The puzzle is
> where's the volunteer to turn the spade?  You get a damn sight more
> possible volunteers if you urge the population of clients and
> intermediaries to consider it.  Your also closer to the population
> that will benefit so the motivation is easier.

I don't disagree, although hopefully everyone will benefit...

> Chicken and egg problems are rarely resolved by focusing on one or
> the other.

Agreed 100%

Cheers,
Danny.

-- 

http://dannyayers.com
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to