Re: RDF

Stefano Mazzocchi Sat, 16 Apr 2005 15:53:36 -0700

Leo Simons wrote:

[snip]

So, ehm, no, I don't actually think it'll be a tremendous win. It'll bring
some huge benefits, but it'll incur a big cost as well. Simplicity loss.

Or maybe not. I'm not exactly an expert here. We do have one of those around
I think. Hence: "Show me!"

The way you deal with statements is a little different than the way you deal with objects. Objects have explicit semantics, as much as statements, but their relationships are not typed.

Example, if you have the Module object and the Project object, you have to decide which way the link goes and the notion of "Module.projects" means, this is the list of projects this module contains.

Problem is that this implicit modeling forces you to say decide the direction of the link, and, in case you want both, you have to model this explicitly and at update, you need to know where to change.

In RDF, you don't have to do all that. If you have a bunch of statements

 ModuleA -(is_a)-> Module
 ProjectA -(is_a)-> Project
 ModuleA -(contains)-> ProjectA
 ProjectA -(has_name)-> "Cocoon"@en^string
 Build-20050415-343 -(is_a)-> Build
 Build-20050415-343 -(built)-> ProjectA
 Build-20050415-343 -(status)-> "failed"@en^string
 Build-20050415-343 -(depends)-> Build-20050415-234
 ...

and so on. It's basically a log of the things you come to know about stuff and this becomes your knowledge base. No structure, you don't need it, you just need to be careful about how you model things and this becomes natural and grows with you. No need to define the objects nor the schema before you know how complex your data is.

Very incremental, very XP, fits nicely both in the lazyness mode and in the separation between data production and data consumption that we want to enforce in Gump3.

Now, what about the data consumption side?

Well, the data is in the triple store, so you need to query it. There are many different ways to do this, but two main categories:

 1) via an API
 2) via a query language

depending on the triple store you use, you get a different API and/or query language. The API feels more natural, but can be less optimized by the triple store.

For example (pseudocode)

Get all modules:
 modules = getSubjects("is_a","Module");

Get all builds that failed:
 builds = model.getSubjects("is_a","Build");
 foreach (build in builds):
        status = model.getObjects(build,"status")
        if (status == "failed"):
                failed_builds.add(build)

you get the idea.

But you could also so something like

 failed_builds = model.get("?x is_a Build where ?x status 'failed'")
        
which is not that hard to get.

Objects are just syntax sugar around SQL statements: you have to model your data first, then add it in. In RDF is the other way around, you pile up your data and the database follows you.

Sure, the argument that objects are better than dealing with JDBC resultsets by hand stands, but making this a general rule could be turn out to be a mistake.

The vision of RDF is data first, metadata later. The vision of relational databases is metadata first, data later.

And the funny thing is that there is nothing in the relational model that suggests you that (in fact, RDF is nothing but an explicit relational model with globally unique identifiers) but the idea of building a database by creating a schema was driven by the vision that statical typing is good for you even if it locks you in (certanly is good for the query indexers, and performance is clearly not the best feature of a triple store nowadays)

I find it somewhat ironic that you now code in a dynamically typed language (and, AFAIK, with good feelings about it) and you advocate that static typing of your data (object or SQL doesn't really matter) is better for you.

I think RDF offers a better model, especially for something integrating data and metadata from different independent domains like Gump.

But of course, I'm biased.

--
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: RDF

Reply via email to