Hi all, I'm Ben Lavender, author of Spira[1], an ORM for RDF.rb [2]. I recently wrote a blog post [3] about Spira in which I briefly mention why I didn't go with a DataMapper adapter. Not long after that, I mused on twitter about DataMapper again [4], and afterwards dkubb messaged me asking for more information (seems a lot of support is done there). I said I'd email a response, and that response turned in to the following novella.
I'm not sure if the tweet was offering help, hunting for feedback, trying to start a discussion, or what, but I decided it didn't matter--I'm very grateful for having had DataMapper as a sane source of inspiration several times now, and if nobody's really interested in this then consider this some detailed, hopefully useful feedback on adapting DataMapper to another data model. Anyway, my ORM, Spira, is based on RDF. I won't get too detailed with what RDF is, but briefly, it's a W3C standard data model which is the roughly the same as an entity-attribute-value system, or a directed graph with named edges, or triples--subject, predicate, object. The cool part is that each identifier, for every subject, predicate, and objects that are not data literals, are globally unique (URI's). RDF really took off this last year, and lots of websites you use every day are publishing it. A decent intro is at [5]. RDF is synonymous with Linked Data, the Semantic Web, and tons of other useless buzzwords, and other buzzwords like RDFa are serializations of RDF. Spira is an attempt at an ORM which embraces the semantics of this data model, which are well-captured by the RDF library for Ruby that Arto Bendiken and I wrote, RDF.rb. Spira is currently built on top of this with minimal other requirements. Spira's DSL looks an awful lot like DataMapper when it's being used, so it would seem pretty boneheaded not to just use DataMapper (and indeed, Arto made a sorta-working dm-rdf adapter a few months ago [6]). But there are problems with the DataMapper API that prevent this. So, as was requested on Twitter, here are my pros and cons for DataMapper for RDF. I must warn that I have learned what I know of DataMapper in bits and pieces over the last year, so some of it may be out of date. DataMapper wins: * Tons and tons and tons of 'paperwork'. Dirty tracking, a useful validation set, lots of field types, eager/lazy loading semantics, and sane semantics for identity maps, just to name a few. This is the biggest win for me--any decent ORM will have to recreate most of this, and this stuff is all well-written, and, importantly, incredibly well-tested. I can't express enough how undelightful it feels to consider redoing most of this when it all exists and is done so well. And when it mostly works, it's sane enough to imagine making work. For example, I can see a path to map existing RDF data literal types to the base set of DataMapper types without bending doing something insane. This is actually a number of smaller wins, lumped into one. * Query filtering! DataMapper's filter_records is awesome, providing tons of useful filter methods entirely at the Ruby level if one so chooses. RDF has some standard query languages, but the data model itself does not have one, and I would love to be able to hand this off to DataMapper initially and implement this huge feature set incrementally. * Some useful plugins. dm-remixable, for example, is an interesting take on module inclusion, and having small-but-useful things like dm-trimmer is excellent. However, DataMapper has a lot of things that make mapping it to RDF a rough fit at present. * Primary key assumptions. RDF operates in the open world--if you don't have a resource, you can't definitively say it it doesn't exist, just that you don't know whether it exists or not, because the primary is globally unique. This disconnect is pervasive in DataMapper. For example, the storage adapter shared specs are based on an adapter that can use an integer primary key, but in RDF, data literals explicitly *cannot* be a key--only URIs are allowed. This has affected the semantics in Spira quite a bit. RDF resources only exist in terms of their known, asserted, non-nil properties. To 'create' a resource and not set any property values isn't a meaningful operation. Similarly, while querying for resources matching certain property values makes sense, to 'find' a property by a primary key cannot meaningfully return 'false'. Further, any valid URI can be used as an identifier for any set of properties (represented, hypothetically, by a DataMapper resource). In Spira, I opted to eschew 'find(key)' and 'create(key)' for 'for(uri)' and 'RDF::URI.as(klass)'. Making this change would be exceedingly difficult in DataMapper. While it can probably be done at the user level, it would be very tough to get away from internally--DataMapper seems to conflate finding, identity, and existence. * Collection semantics. In RDF, all properties are n-ary sets of values. While Spira defines a way to ignore all but one instance of a property for simplicity of avoiding this, I don't see how to represent the case of a multiply-valued property. Making it a property with a Set object value is a problem, since dirty checking is done by object identity, so updating said set won't do. And the 'has x' DSL is only for relations, not normal properties. * Relation semantics. DataMapper relation semantics are an exact mirror of relational database semantics and are not the same as the semantics used in graph databases. For example, 'has n' looks for a key on the child object that refers to the parent, and many to many requires a linking table/class. In the scope of RDF, all of this is relational database voodoo, and unnecessary; relations are just a property pointing to the URI of another resource, and there can be a set of them. Spira handles relations with one extra case in the case statement where it checks property type. It really couldn't be simpler. * Many plugins won't work, often because of the above problems with the idea of 'primary key'. For example, dm-taggable lets you define a table name for tags via a symbol, then attach them to a model. But the RDF equivalent of a RDB table would be a globally unique RDF type URI, and symbols won't cut it. This, and many other plugins, are simply not going to work, and none of them will have test cases relevant to this, so it will be very difficult to tell what works and what doesn't. * The query and filter modeling is sweet, but I am having trouble finding documentation on how to implement an adapter that knows what to do with these. This is perhaps just me not digging deep enough yet, but I suspect it was a contributing factor to the demise of the apparently-defunct CouchDB adapter, which had a completely separate interface for searching when it needed to use Couch's javascript map/reduce functionality (an overview of the problems adapting a non-SQL query format to DataMapper is at [7]). * Because RDF is a consistent data model with multiple implementations, it's natural to define an interface to an RDF data store, which is just what we've done in RDF.rb (RDF is much more consistent than SQL, and this is a lot easier problem than the one projects like DataObjects have taken on). This has some unfortunate overlap with DataMapper repositories; since RDF repositories are truly semantically equivalent, one can do things like create a union of repositories that acts like a normal repository, and then pass this on to whatever ORM lives above the data model layer. DataMapper has a sizable chunk of semantics dealing with repositories, and meeting them all will probably be re-treading some ground we covered in the core RDF.rb library. * DataMapper has me writing a plugin instead of an entire framework. This severely limits agile redefinition of semantics--if significant chunks of DataMapper need retrofitting to support some or another thing I want to try, it's much harder to experiment as I go. * The DataMapper project is rather large and imposing. I only submitted a patch once, a pending test for an open suggestion ticket, and nothing happened. Now, for a project as large and complicated as this, I don't imagine that every random guy who has a patch or problem will get a response, but if I want to make an awesome RDF ORM, and I need to make patches, and there's a hill to climb before patches I produce will be accepted, that worries me--I want to be solving the RDF ORM problem, not incidental ones. The last two reasons were my deciding factors. I've been working on this RDF stuff for a few years now and, I really wanted to focus on the questions raised by an RDF ORM. DataMapper has some things that RDF just does not need. The relations semantics given above would be a good example. I think it's key that an RDF ORM not make compromises on semantics--unlike most nosql stores, RDF is a cross-vendor, cross-implementation data model [8]. If a workable solution comes about, it's not a one-off adapter, but a link to a whole ecosystem. That being said, I've come to appreciate just how well-done a lot of the 'baseline' stuff in DataMapper is. And I've played with the semantics for a few months now and come across some of the bigger questions on the RDF semantics front. So if someone in the DataMapper community has some RDF interest, and someone who knew their way around better wanted to help me with a roadmap of how to approach some of these issues, it would be much appreciated. In any case, I hope these comments are useful, and please accept my delving into DataMapper to this degree as a compliment to what you guys have done. [1]: http://spira.rubyforge.org [2]: http://rdf.rubyforge.org [3]: http://blog.datagraph.org/2010/05/spira [4]: http://twitter.com/dkubb/status/16095960423 [5]: http://www.rdfabout.com/quickintro.xpd [6]: http://github.com/bendiken/dm-rdf [7]: http://holmwood.id.au/~lindsay//2009/02/15/everything-old-is-new-again/ [8]: http://blog.datagraph.org/2010/04/rdf-nosql-diff -- You received this message because you are subscribed to the Google Groups "DataMapper" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/datamapper?hl=en.
