Re: [elephant-devel] Is it possible to query by 2 index ranges at the same time

Ian Eslick Thu, 31 May 2007 23:10:29 -0700

That's very insightful and quite correct! I was going to use thismechanism as part of the query system I want to get working for 1.0.This is pretty much the mechanism that relational databases use to dojoin queries and Elephant will be no different.

As a matter of discipline, I didn't want to encourage users to messaround with oids because of the issues outlined in an e-mail thread afew days ago. There will be an internal elephant api to access thisfunctionality for advanced users. That function will essentially bea copy of map-inverted-index, but using cursor-get to get only theoid instead of cursor-pget to get the oid and primary value. I'mhappy to be argued out of this position, but my experience is thatpeople will start to depend on this interface and then complain whenfeatures are added later that leads to bugs in user code.

The query system will perform these kinds of optimizations for you.The interface will be something like:


(map-query
  (lambda (obj)
     (render obj screen))
  (select-objects (geometry)
    where
    (between geometry.x 10 20)
    (between geometry.y -5 15)))

For a class named geometry with indexed slots x and y. If x and yare indexed, then the query will be very efficient. If not, then itwill be a linear walk. I aim to add informative statements andstatistics to select-objects to help people optimize queries. Thisis similar to the select clause in SQL, but will have featuresoriented towards path queries (object references) and graph queries.

The select-objects statement effectively returns a generator, whichwill probably be a list of oids, either in memory or maintained ondisk for large query sets.

I'm open to suggestions on syntax. I'm not happy with my currentsketches here.

Ian

PS - If you really want to get OIDs as the user, you can usingcursors over the inverted index to get the ranges yourself, then youcan do exactly the intersection operation you mentioned before. Ofcourse you need the classes of the oids so selected to recover theobjects.


On May 31, 2007, at 11:43 PM, [EMAIL PROTECTED] wrote:

Please excuse if I don't make a direct reference to Elephantsolving this in my comment below. However, I remembered readingsomething just like this in AllegroCache's Reference Manual, inwhich it said, and I quote:

"In a database every object has a unique object identifier (oid).This value can be retrieved using db- object- oid. An oid is aninteger. There is no way to determine the class of an object givenits oid.

Usually a program need not concern itself with oids. However incertain circumstances it may be convenient to work with oids. Onesuch case is when combining the results of multiple indexes overthe same class. You may want to ask for the set of objects whose Xslot is greater than 10 and whose Y slot is greater than 20. Itconsumes fewer resources to ask for the oids of objects whose Xslot is greater than 10 than to ask for the objects themselves. Inthe later case the objects retrieved have to be instantiated in theobject cache and there's no point in doing that if you don't needall those objects. In this case you don't need all those objectssince you only need those objects whose Y slot is also greaterthen 20. Thus the optimal way to do this query is to find theintersection of the oids corresponding to "X > 10" and those oidswith "Y > 20" and then from that intersection find the objectscorresponding to the oids."


They later document the following function:

"retrieve- from- index- range (class slot name
                            initial value end value
                           &key (db *allegrocache*) oid)

returns all objects of the given class (a persistent class objector a symbol) whose slot slot name has a value in the rangebeginning with initial value up to but not including end value. Ifoid is true then the object id values are returned instead of theobjects."

It's interesting to note the "oid" key argument. I'm not sure ifElephant's get-instances-by-range supports something like that.But, a similar solution to the problem presented is:

"We could find the oids of all the employees whose first namebegins with “Jo” and whose last name begins with “F” using


(intersection (retrieve-from-index-range

‘Employee ‘first-name “Jo” “Jp” :oidt)

              (retrieve-from-index-range

‘Employee ‘last-name “F” “G” :oidt))"

given a persistent class Employee with indices on slots first-nameand last-name.

If Elephant supports something similar to the "oid" key argument,it would be very useful for these kind of queries, where you canget an initial result set and only create the instances of thoseobjects if/when needed.


Just my $0.02 :)

- Daniel

On Thu, May 31, 2007 11:07 pm, Ian Eslick <[EMAIL PROTECTED]> said:

Ignas,

The easiest way to do this is to follow Robert's suggestion, and
declare slots x and y to be indexed (assuming your parameters are
slot values of a persistent objects) and then say:

(remove-duplicates
   (nconc (get-instances-by-range 'my-class 'x 10 20)
          (get-instances-by-range 'my-class 'y -5 15)))

Of course you might do a little better, performance-wise, if you
filter the y value as you traverse the x range (or vice-versa).
Since get-instances-by-range is just a wrapper around map-inverted-
index that collects visited objects, this could be up to twice as
fast, although in practice I'd expect the benefit to be marginal:

(let ((matches nil))
   (map-inverted-index (lambda (obj)
                         (when (and (> y -5) (< y 15))
                           (push obj matches)))
                      'my-class
                      'x
                      10
                       20)
   matches)

This version avoids a second map-inverted-index operation (called by
get-instances-by-range) by only collecting objects if their y
coordinate is in the range.  You can wrap all this in a function like
(get-objects-in-region x1 y1 x2 y2).  Map over the x or y based on
which query is smaller, or which parameter is sparser.  If you don't
know, then it doesn't really matter which you choose!

If you get some performance comparisons of these two scenarios,
please post them to the list!

Regards,
Ian

PS - It would be interesting to see an R-tree (or one of the well-
known variants such as R* or Priority R-Trees) in Elephant.  Instead
of using indexed classes, you would directly implement the R-tree
nodes as persistent instances and write the construction and
retrieval algorithms as if they were operating on in-memory objects.
Graphs are a very natural structure to implement in Elephant and the
only special provision I can think of might be maintaining a
persistent set of free nodes for the dynamic case.

On May 31, 2007, at 10:17 PM, Robert L. Read wrote:

I'm assuming x and y are properites of a data object, which has
some other component z which
you with to retrieve, and you query is that you want to find all
the records (x,y,z) such that
(10 < x < 20) and ( -5 < y < 15).

There is a spectrum of solutions to this problem.  However, in the
general case Elephant will
not compute this with the best possible asymptotic complexity,
although it may be better than
a relational database at doing so.

Here is the most naive solution:

Examine every object in the database, and report those that meet
that above condition.

Although this may seem silly, don't knock it till you try it....it
may be perfectly reasonable.

A second solution is to make sure that you have specified :index on
the x component
(or, without loss of generality, the y component), and then using
the "get-instance-by-range"
feature to get all of the instances within the x range, and use a
simple lisp function to discard
those for which the y component does not match.

This will be relatively fast if the x range excludes a lot of
objects and the y range doesn't.

It's not obvious to me that one can do better than that without
some significant coding
(For example, a "grid file" is a datastructure designed to answer
such queries efficiently.
An R-Tree is another geometric structure.)

A relational database will not do better, in the worst case.  A
relational database will do
a better job at statistical sampling---that is, a query optimizer
should, in a huge database,
be able to decide whether it should use the "x" index or the "y"
index first based on the
selectivity of those clauses in the boolean expression.  However,
fundamentally it can't
do any better than to pick the best index, use that only examine
some of the objects, and then
look at the values of the other ones.

On Fri, 2007-06-01 at 03:48 +0300, Ignas Mikalajunas wrote:

Hi, i have an interesting usecase and i would like to ask whether
i can solve it by using Elephant. I would like to index objects
placed on a discreet 2d grid of an arbitarry size, which would
require queries like (10 < x < 20) AND (-5 < y < 15). Is it
possible to perform such a query with elephant? I am not sure i
have looked in the right place, but i could not find such example
in the manual. Ignas Mikalajūnas
_______________________________________________ elephant-devel
site list elephant-devel@common-lisp.net http://common-lisp.net/
mailman/listinfo/elephant-devel

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel


_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Is it possible to query by 2 index ranges at the same time

Reply via email to