Re: [elephant-devel] Is it possible to query by 2 index ranges at the same time

lists Fri, 01 Jun 2007 06:36:48 -0700

Hi Ian,

I read the thread from a few days ago regarding OIDs. While I agree with your 
position, I think it's beneficial to be able to get a list of OIDs to be able 
to perform these kinds of operations. I think the premise from the other thread 
was that of "relying" on OIDs for relational purposes.


I also like the idea of where the query system may be going. But, independently 
of that, being able to specify something like :oid t in the 
get-instances-by-range function and then be able to "walk" through a set of 
oids, specifying the base class and the "walk" mechanism will instantiate those 
objects. Think of it as a primitive to the query system.

Thanks,
Daniel

On Fri, June 1, 2007 2:08 am, Ian Eslick <[EMAIL PROTECTED]> said:

> That's very insightful and quite correct!  I was going to use this
> mechanism as part of the query system I want to get working for 1.0.
> This is pretty much the mechanism that relational databases use to do
> join queries and Elephant will be no different.
> 
> As a matter of discipline, I didn't want to encourage users to mess
> around with oids because of the issues outlined in an e-mail thread a
> few days ago.  There will be an internal elephant api to access this
> functionality for advanced users.  That function will essentially be
> a copy of map-inverted-index, but using cursor-get to get only the
> oid instead of cursor-pget to get the oid and primary value.  I'm
> happy to be argued out of this position, but my experience is that
> people will start to depend on this interface and then complain when
> features are added later that leads to bugs in user code.
> 
> The query system will perform these kinds of optimizations for you.
> The interface will be something like:
> 
> (map-query
>    (lambda (obj)
>       (render obj screen))
>    (select-objects (geometry)
>      where
>      (between geometry.x 10 20)
>      (between geometry.y -5 15)))
> 
> For a class named geometry with indexed slots x and y.  If x and y
> are indexed, then the query will be very efficient.  If not, then it
> will be a linear walk.  I aim to add informative statements and
> statistics to select-objects to help people optimize queries.  This
> is similar to the select clause in SQL, but will have features
> oriented towards path queries (object references) and graph queries.
> 
> The select-objects statement effectively returns a generator, which
> will probably be a list of oids, either in memory or maintained on
> disk for large query sets.
> 
> I'm open to suggestions on syntax.  I'm not happy with my current
> sketches here.
> 
> Ian
> 
> PS - If you really want to get OIDs as the user, you can using
> cursors over the inverted index to get the ranges yourself, then you
> can do exactly the intersection operation you mentioned before.  Of
> course you need the classes of the oids so selected to recover the
> objects.
> 
> On May 31, 2007, at 11:43 PM, [EMAIL PROTECTED] wrote:
> 
>> Please excuse if I don't make a direct reference to Elephant
>> solving this in my comment below. However, I remembered reading
>> something just like this in AllegroCache's Reference Manual, in
>> which it said, and I quote:
>>
>> "In a database every object has a unique object identifier (oid).
>> This value can be retrieved using db- object- oid. An oid is an
>> integer. There is no way to determine the class of an object given
>> its oid.
>>
>> Usually a program need not concern itself with oids. However in
>> certain circumstances it may be convenient to work with oids. One
>> such case is when combining the results of multiple indexes over
>> the same class. You may want to ask for the set of objects whose X
>> slot is greater than 10 and whose Y slot is greater than 20. It
>> consumes fewer resources to ask for the oids of objects whose X
>> slot is greater than 10 than to ask for the objects themselves. In
>> the later case the objects retrieved have to be instantiated in the
>> object cache and there's no point in doing that if you don't need
>> all those objects. In this case you don't need all those objects
>> since you only need those objects  whose Y slot is also greater
>> then 20. Thus the optimal way to do this query is to find the
>> intersection of the oids corresponding to "X > 10" and those oids
>> with "Y > 20" and then from that intersection find the objects
>> corresponding to the oids."
>>
>> They later document the following function:
>>
>> "retrieve- from- index- range (class slot name
>>                             initial value end value
>>                            &key (db *allegrocache*) oid)
>>
>> returns all objects of the given class (a persistent class object
>> or a symbol) whose slot slot name has a value in the range
>> beginning with initial value up to but not including end value. If
>> oid is true then the object id values are returned instead of the
>> objects."
>>
>> It's interesting to note the "oid" key argument. I'm not sure if
>> Elephant's get-instances-by-range supports something like that.
>> But, a similar solution to the problem presented is:
>>
>> "We could find the oids of all the employees whose first name
>> begins with “Jo” and whose last name begins with “F”
>> using
>>
>> (intersection (retrieve-from-index-range
>>                     ‘Employee ‘first-name “Jo”
>> “Jp” :oid
>> t)
>>               (retrieve-from-index-range
>>                     ‘Employee ‘last-name “F”
>> “G”   :oid
>> t))"
>>
>> given a persistent class Employee with indices on slots first-name
>> and last-name.
>>
>> If Elephant supports something similar to the "oid" key argument,
>> it would be very useful for these kind of queries, where you can
>> get an initial result set and only create the instances of those
>> objects if/when needed.
>>
>> Just my $0.02 :)
>>
>> - Daniel
>>
>> On Thu, May 31, 2007 11:07 pm, Ian Eslick <[EMAIL PROTECTED]> said:
>>
>>> Ignas,
>>>
>>> The easiest way to do this is to follow Robert's suggestion, and
>>> declare slots x and y to be indexed (assuming your parameters are
>>> slot values of a persistent objects) and then say:
>>>
>>> (remove-duplicates
>>>    (nconc (get-instances-by-range 'my-class 'x 10 20)
>>>           (get-instances-by-range 'my-class 'y -5 15)))
>>>
>>> Of course you might do a little better, performance-wise, if you
>>> filter the y value as you traverse the x range (or vice-versa).
>>> Since get-instances-by-range is just a wrapper around map-inverted-
>>> index that collects visited objects, this could be up to twice as
>>> fast, although in practice I'd expect the benefit to be marginal:
>>>
>>> (let ((matches nil))
>>>    (map-inverted-index (lambda (obj)
>>>                          (when (and (> y -5) (< y 15))
>>>                            (push obj matches)))
>>>                   'my-class
>>>                   'x
>>>                   10
>>>                        20)
>>>    matches)
>>>
>>> This version avoids a second map-inverted-index operation (called by
>>> get-instances-by-range) by only collecting objects if their y
>>> coordinate is in the range.  You can wrap all this in a function like
>>> (get-objects-in-region x1 y1 x2 y2).  Map over the x or y based on
>>> which query is smaller, or which parameter is sparser.  If you don't
>>> know, then it doesn't really matter which you choose!
>>>
>>> If you get some performance comparisons of these two scenarios,
>>> please post them to the list!
>>>
>>> Regards,
>>> Ian
>>>
>>> PS - It would be interesting to see an R-tree (or one of the well-
>>> known variants such as R* or Priority R-Trees) in Elephant.  Instead
>>> of using indexed classes, you would directly implement the R-tree
>>> nodes as persistent instances and write the construction and
>>> retrieval algorithms as if they were operating on in-memory objects.
>>> Graphs are a very natural structure to implement in Elephant and the
>>> only special provision I can think of might be maintaining a
>>> persistent set of free nodes for the dynamic case.
>>>
>>> On May 31, 2007, at 10:17 PM, Robert L. Read wrote:
>>>
>>>> I'm assuming x and y are properites of a data object, which has
>>>> some other component z which
>>>> you with to retrieve, and you query is that you want to find all
>>>> the records (x,y,z) such that
>>>> (10 < x < 20) and ( -5 < y < 15).
>>>>
>>>> There is a spectrum of solutions to this problem.  However, in the
>>>> general case Elephant will
>>>> not compute this with the best possible asymptotic complexity,
>>>> although it may be better than
>>>> a relational database at doing so.
>>>>
>>>> Here is the most naive solution:
>>>>
>>>> Examine every object in the database, and report those that meet
>>>> that above condition.
>>>>
>>>> Although this may seem silly, don't knock it till you try it....it
>>>> may be perfectly reasonable.
>>>>
>>>> A second solution is to make sure that you have specified :index on
>>>> the x component
>>>> (or, without loss of generality, the y component), and then using
>>>> the "get-instance-by-range"
>>>> feature to get all of the instances within the x range, and use a
>>>> simple lisp function to discard
>>>> those for which the y component does not match.
>>>>
>>>> This will be relatively fast if the x range excludes a lot of
>>>> objects and the y range doesn't.
>>>>
>>>> It's not obvious to me that one can do better than that without
>>>> some significant coding
>>>> (For example, a "grid file" is a datastructure designed to answer
>>>> such queries efficiently.
>>>> An R-Tree is another geometric structure.)
>>>>
>>>> A relational database will not do better, in the worst case.  A
>>>> relational database will do
>>>> a better job at statistical sampling---that is, a query optimizer
>>>> should, in a huge database,
>>>> be able to decide whether it should use the "x" index or the "y"
>>>> index first based on the
>>>> selectivity of those clauses in the boolean expression.  However,
>>>> fundamentally it can't
>>>> do any better than to pick the best index, use that only examine
>>>> some of the objects, and then
>>>> look at the values of the other ones.
>>>>
>>>> On Fri, 2007-06-01 at 03:48 +0300, Ignas Mikalajunas wrote:
>>>>> Hi, i have an interesting usecase and i would like to ask whether
>>>>> i can solve it by using Elephant. I would like to index objects
>>>>> placed on a discreet 2d grid of an arbitarry size, which would
>>>>> require queries like (10 < x < 20) AND (-5 < y < 15). Is it
>>>>> possible to perform such a query with elephant? I am not sure i
>>>>> have looked in the right place, but i could not find such example
>>>>> in the manual. Ignas Mikalajūnas
>>>>> _______________________________________________ elephant-devel
>>>>> site list elephant-devel@common-lisp.net http://common-lisp.net/
>>>>> mailman/listinfo/elephant-devel
>>>> _______________________________________________
>>>> elephant-devel site list
>>>> elephant-devel@common-lisp.net
>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel@common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel@common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
> 
> 

_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

Re: [elephant-devel] Is it possible to query by 2 index ranges at the same time

Reply via email to