[elephant-devel] Query System

[EMAIL PROTECTED] Fri, 09 May 2008 15:03:07 -0700

Hello everyone,

I apologize for being disconnected for so long. I had volunteered tohelp in the query system and should have done more progress by now.Unfortunately, the same as some (most or all) of you, putting food onthe table for my family has a higher priority and my current job hasdemanded 110% of my time lately.

Enough excuses! I have been passively reading several of your emailthreads. I am convinced that a query system will bring a lot of valueto Elephant. The question that still arises is whether or not peoplewant a SQL-like syntax or a Lisp-like syntax.

As Ian has suggested, publicly and/or privately, we should startdesigning the query system in a very basic form. The most criticalpart would be query optimization, which I'd rather work on after wehave the basic query system in place. But there are a lot of decisionsto make before we get there and coming to a consensus of how it shouldlook and how it should work is of critical importance.

From a simplistic point of view, a SQL-like syntax should allow forthe execution of the basic relational algebraic operations (union,difference, cartesian product, projection, and selection). For themost part, these would not be difficult to implement. However, IMHO,there is an intrinsic "contradiction" in applying a SQL-like syntax ontop of Elephant.


Assume you have the following Tables (relations) in a SQL world:

Books (
  book_id,
  title,
  author
)

Publishers (
  publisher_id,
  name
)

BooksPublishers (
  book_id,
  publisher_id,
  year
)

Suppose you wanted to get the cartesian product of all the bookspublished in 2008, you could run a SQL query like:

SELECT Books.*, Publishers.* FROM Books, Publishers, BooksPublishersWHERE Books.book_id = BooksPublishers.book_id ANDPublishers.publisher_id = BooksPublishers.publisher_id ANDBooksPublishers.year = 2008

The result will be a concatenation of all the columns from the Booksand Publishers tables. In a SQL-world, you would access these resultsin a key-value pair type mode (e.g. Books.book_id = 1, Books.title ="1984", etc). However, when you think in terms of Elephant (at leastmy understanding of it), you're dealing with objects and not key-valuepairs from multiple tables. So, instead of getting a concatenation ofall the columns, you "should" be getting just a list of Book objects(or Publisher objects) that met your query criteria, such that whenyou iterate thru them, you could "query" their Publishers (or theBooks). So, if we had something like (please keep in mind this is nosuggestion to syntax or correctness but just for illustrative purposes):


(defpclass book ()
  ((title :accessor book-title :index t)
   (author :accessor book-author :index t)
   (published_copies :accessor book-copies :initform (make-pset))))

(defpclass publisher ()
  ((name :accessor publisher-name :index t)))

(defmethod add-published-copy ((bk book) (pb publisher) year)
  (insert-item '(pb year) (book-copies bk)))

(defmethod map-published-copies (fn (bk book))
  (map-pset fn (book-copies bk)))

(setq objs (select book :where ((map-published-copies (lambda (itemyear) (= (second item) year)) $bk 2008)))))

From then on, you could just iterate through the book objects in theresult set for their respective published copies. The problem withthis is that, ok, you get all the books that met your criteria but ifyou then wanted to get a list of all the published copies, you wouldneed to apply the filter criteria again. The reason I think it "shouldbehave" this way is because Elephant deals with sets of objects, andyou use Lisp to navigate through the object space, whereas in a SQL-world you are not dealing with objects but with a result set thatcontains all the columns you asked for. If we were to emulate the samebehavior in the query system, that would sort of defeat the purpose ofElephant. For that matter, you might as well use some of the otherlibraries (e.g. CL-SQL, cl-perec, cl-rdbms, etc).

The above example is a very simple example. We haven't looked atSORTING, LIMIT, OFFSET, etc. Things which will simply make this wholedilemma more difficult.

I haven't looked into Ian's association mechanism yet. Maybe the querysystem could/should be an extension to that with some specializedfeatures to apply filter criteria instead (and possibly evolve intosomething similar to Ruby's ActiveRecord). I know the associationmechanism is still being developed and I haven't really seen anyonecomment much on it other than what Ian has mentioned. In one of Ian'scomments, he said:

"A more general query language is probably the right solutionfor this interface. The query language would know about associations,derived indices, etc and perform query planning via introspection overthe class objects."


At the same time, Robert said on another thread:

"One might philosophically prefer SQL. I personally vasterprefer to work in a powerful programming language to accomplish thesethings. Obviously, whether two classes that refer to each other standin a "parent-child" relationship or not depends entirely on thecircumstances. I prefer to write simple functions such as "delete-order" below, which both utilize and (in a sense) expand the power ofLISP applied to persistent objects."


Leslie said on yet another thread:

"While I'm at it: OFFSET and LIMIT (a real limit which lets youspecify an arbitrary Lisp expression) are things we definitely want toaim for in 1.0. They are not difficult to implement at all, but theydon't work with GET-INSTANCES-BY-* and, worse, MAP-BTREE. This meanseveryone has to write their own version of these functions that takeappropriate arguments and move the cursor around themselves instead ofrelying on a simple high-level API.

I'd have implemented these extensions myself, but I thought it betterto wait for the integration of the query language to add it."


And Alex said:

"I think main problem is not how it looks, but that querylanguage actually makes programming a lot easier."

All those comments make sense. There seems to be a group agreementthat something is needed, but everyone has their own ideas of how itshould work. Both the query language and the associations are stillbeing developed, so if we get consensus no how these should work, itmay give a better direction to both feature sets. If anyone has anycomments or suggestion as to whether a query system be of realinterest/necessity and if so, which would be the preferred querysyntax and expected behavior, that would really help.

I'm willing to work on this in as much as possible with my limitedknowledge of Lisp and Elephant. However, given a clear direction ofwhere this should go, I will be able to focus better and learn fasterwhat I haven't learned so far.

Again, your feedback is much appreciated. I'm hopeful to be able towork more on this over the weekend, assuming I get some feedback fromyou guys.


Thanks
Daniel
_______________________________________________
elephant-devel site list
elephant-devel@common-lisp.net
http://common-lisp.net/mailman/listinfo/elephant-devel

[elephant-devel] Query System

Reply via email to