Hi Everyone,

I've been speccing the underlying objects that make up Query
generation a bit better and have found some interesting improvements,
and one potential problem that I wanted to bring up to the list.

DataMapper has a special Collection object that gives it the
interesting ability to chain collections without lazy loading from the
datastore.  So you can do things like this:

  class User
    # ...

    def self.active
      all(:active => true)
    end

    def self.confirmed
      all(:confirmed => true)
    end
  end

  User.active.comfirmed
  # => SELECT * FROM users WHERE active = true AND confirmed = true

(I'm using SQL because it's a nice compact way to represent a query
that everyone understands. Everything described in this post should
work for all datastores, albeit sometimes not as efficiently as with
an RDBMS that supports subselects)

This allows you to do things like ActiveRecord's named scope, but
without any special DSL or complex code.  It's all plain Ruby which is
fantastic.

I've been thinking about the Query and Collection object while
speccing them, and I realized that a Query is just a representation of
a *potential* Set of resources and that we could apply Set operations
like union, intersection and difference on them. This would allow us
to do things like:

  User.all(:active => true) | User.all(:confirmed => false)
  User.all(:active => true) + User.all(:confirmed => false)
  # => SELECT * FROM users WHERE active = true OR confirmed = true

  User.all(:active => true) & User.all(:confirmed => false)
  # => SELECT * FROM users WHERE active = true AND confirmed = true

  User.all(:active => true) - User.all(:confirmed => false)
  # => SELECT * FROM users WHERE active = true AND NOT(confirmed =
true)

The first one makes me most excited because while the SQL generation
code has supported OR conditions for a long time, we've not had a way
to do OR in the DataMapper API without resorting to dropping to raw
SQL.  This is a bit wordy still, but providing primitives to do these
things will allow us to build even better APIs on top later on.

One cool thing about this is that it could be combined with query
chaining to provide some amazing possibilities, eg:

  (User.active & User.confirmed) + (User.customers - User.overdue)

This would return a Collection that when lazy-loaded would run a
*single query* on the datastore.  That's pretty *huge* IMHO.

The best thing about this is that the implementation is almost
trivial.  Here's a quick code spike I did the other day showing how
easy it was to add to the DataMapper API:

  http://gist.github.com/221039

This doesn't handle every example, but it does demonstrate most of the
behavior with just a few lines of code.

In thinking about this though, I did discover one flaw in our current
approach to query chaining, consider the following contrived example
where we want to find the top 10 users out of the 50 most recent
signups:

  class User
    # ...

    def self.recent
      all(:order => [ :created_at.desc ], :limit => 50)
    end

    def self.top10
      all(:order => [ :points.desc ], :limit => 10)
    end
  end

  User.recent.top10
  # => SELECT * FROM users ORDER BY points DESC LIMIT 10

Now notice how the query for the top10 method clobbers the recent
query?  What we'd want is something the equivalent of:

  # => SELECT * FROM users WHERE id IN(SELECT id FROM users ORDER BY
created_at DESC LIMIT 50) ORDER BY points LIMIT 10

So in the above case we select the last 50 users, and then from that
group we get the top 10 users sorted from highest to lowest point
score.

The approach we use to get the intersection of the queries is a bit
flawed when you include :limit, :offset or :links in that it clobbers
the values from the other query when it really doing be doing a
subselect type query.  It should work perfectly when you want to merge
conditions, which is what I suspect most people use it for, but I
wonder how many people have been hit by this?

My plan to fix this is to keep the query chaining as-is for now, and
add union, intersection and difference to the underlying Query and
Collection objects.  I'll spec all the behavior so that it handles the
above cases perfectly, and then I'll swap out the query chaining with
the intersection code and make sure all the specs pass.

I'm not sure if this will be in the next release or not (which is
coming next week), but it should definitely be available in the
release after that, probably in 2-3 weeks from now at the latest.

--

Dan
(dkubb)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DataMapper" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/datamapper?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to