Re: [Dbix-class] abstract syntax (extract from a conversation)

Darren Duncan Sat, 21 Jul 2007 03:00:24 -0700

At 4:01 AM +0100 7/21/07, Matt S Trout wrote:

This is the most coherent message to show the lines along which we were
thinking (hdp is confound on #dbix-class).


Those of you who don't have context for this, poke the list archives.

Those of you who do, please bear in mind when responding what anything
involving the creation of assloads of objects will be laughed at and/or
ignored since the intention is SQL::Abstract-level performance.

<snippage>

Since its already nearly 2am here, I'll just put out a few of myideas and leave the rest for later.

1. I'm assuming your comment about objects means that you don't wantto have, eg, an object for each scalar value or entity name orexpression tree node or whatever. Sure, that's fine, and should begood for speed, and perhaps brevity, but taking your examples as apoint of departure, we'll probably want to add at least one moreelement to each of the many array refs defining expression nodes soto provide meta-data about the node, such as a replacement for themeta-data that the name of the class an object is blessed intoprovides.

2. As far as I recall from a discussion we had on IRC, the new ASTwe are defining here is supposed to work not just with SQL databasesbut also databases accessed via some other language, such as LDAP, ormy Muldis D. Partly for this reason, and partly just because SQLDBMSs differ from themselves enough that talking to them is likemultiple languages, I believe that our AST should not conceptually belimited by some SQL lowest common denominator, and it should notsimply try to mirror the structure of a simple select query.

3. Don't go lowest common denominator. If our AST is good, gluecode that talks to a less capable back-end should be able to breakdown what the AST says into smaller chunks that the back-endunderstands, and feed them appropriately so that the back-end stilldoes the right thing, and appropriately gather the results and returnthem as if the back-end was able to do that natively.

4. As should be the nature of ASTs, the focus of ours should be toaccurately representing the *semantics* or meaning of what the userwants. The AST should provide the means to explicitly say what thedesired behaviour is for particular constructs, any time there is areasonable chance that either different backends have differentdefaults in that regard, or users are likely to have differentdefault expectations. Of course, our AST can have various defaultbehaviours defined for it such that users don't have to be explicitabout some details if their desires match the defaults, but we stillneed to specify it in the design docs of our AST itself, and not justleave a lot of things to be back end implementation defined.

5. Our AST should be strongly typed from end to end, which assistsin semantics. Any piece of data that it carries should know whetherit is text or a number or whatever. That way, if we have '0124', weknow how round-tripping it through a database would retain theleading zero or not. Matters of case-sensitivity need to be definedand not left to back-end defaults. That's not to say that we can'thave generic types, as per Perl scalars, but these should be definedover stronger types, such that eg every value is of a certainstronger type, but a particular variable is allowed to hold values ofany of several types.

6. We need to define our own full set of system-defined types andoperators, which users of our AST invoke, and which back-ends gluingour AST then convert into or emulate native equivalents. Moreover, Irecommend that our names for all such things are spelled with justletters, eg use 'equal' and 'not_equal' rather than '==' or '!=' etc.

7. It is essential to have the distinct concept of a logical booleandata type, and values, and operators. This is the result type ofequality tests or and|or etc.

8. The most important distinct simple data types are: boolean,integer, bit string (blob), character string (text); then othernumerics, then temporal types, then whatever such as spatial types ifwe want them.

9. The AST should support the concept of having collection-typedvalues, so that eg we can have table field values that are themselveseg tables, rows, arrays, etc. Nevermind whether the back-end DBMScan do this, some can, some can't, where they don't, we can fake itby splitting tables behind the scenes. If we have native supportlike this, it should be easy to, say, formulate a query over eg aone-to-many table relationship that returns both a parent record andits child records, in a single result set, without duplication; theresult set eg has one row per parent record, and one field of thatrow is table-typed and contains the child records.

10. Tables should always contain, and queries should always return,no duplicate rows, if not always than at least by default. Usersshould have to explicitly say if they want duplicates, and if notthen every row will be distinct. This is what most people wantanyway, and doing it by default will significantly reduce bugs inuser code that crop up due to duplicates being present.

11. Any operator that is conceptually N-ary should simply be definedto take N similar arguments, that argument being an array ref orsomething. Similarly, 'and' and 'or' should be ordinary N-aryboolean operators, as are string concatenation, and numericaladdition and multiplication. For that matter, relational union,intersection, and natural join are all N-ary as well. All theexamples are commutative, save concatenation, and all areassociative. Put another way, any N-ary operator is a "reduceoperator", iterating over a list to produce one result.

12. Our AST should be setup to only allow column names in a rowsetto be distinct. If eg 2 tables are joined that have common columnnames, then if those 2 columns represent the same data and areredundant following the join, then eliminate one, or else if theydon't rep the same data, rename one (SQL has 'as' for a reason).

13. Columns should be referred to by name only, not by any ordinalposition. When specifying a relational union, the column names ofboth operands need to be the same, and columns will match up oncommon names.

14. Relational joins should all be natural joins, such that given 2rowset/table operands, the join should simply match them up oncolumns of the same name (if necessary, columns of the operands canbe renamed first to either be the same or different as needed).Doing it this way lets an N-table join be commutative, and the resultwon't have any duplicate columns. Its also easier to specify since,aside from possible column renaming of the operands, you don't needto specify join conditions to do a join. And if the db schema iswell designed in the first place, you often won't have to renamecolumns either when joining them, or not often.

15. There should be distinct entity name spaces for system-definedtypes and operators, and user-defined ones. Eg, have either a tag ora name prefix on eg every operator call to specify. So then, our ASTcan specify invocation of stored procedures or functions et al thesame way it specifies using other operators. Eg, 'sys.Int.add'versus 'user.bar_schema.foo_proc'. Doing it this way, there's noconcern about reserved words.

16. The AST should treat a query as an arbitrary depth self-similarexpression tree, where both scalar and relational operators can becalled in any place. In SQL terms, the AST should embrace derivedtables or subqueries or whatever. Don't leave these out just becausesome backends don't have them; we can fake it there if we have to.

17. The AST should just use such as the various simpler relationalalgebra or calculus operators rather than monolithic 'select'. Forexample, each of these is done using a separate operator, calls towhich can be chained: selecting a subset of columns, filtering rows,joining rowsets, unioning rowsets, attaching new columns, groupingrows, summarizing rows, sorting rows, etc.

So that's probably a good start. I can suggest specific alterationsto the example syntax if that is useful or people can't get what I'msaying above without such examples.

Meanwhile, I highly recommend looking athttp://search.cpan.org/dist/Muldis-DB/lib/Muldis/DB/Language/Core.pod, which is the currently defined list of core system-definedoperators (and data types) of Muldis D. This may give you some goodideas for what specific operators you want to have built-in to ourAST definition. Note that if you don't understand some of myterminology, or want some context, you may want to readhttp://search.cpan.org/dist/Muldis-DB/lib/Muldis/DB/Language.podfirst, at least the NOTES ON TERMINOLOGY section. For example, I say'relation value' rather than 'rowset' and 'relation variable' ratherthan 'table'.


P.S.  And now it's 3am here.

-- Darren Duncan

_______________________________________________
List: http://lists.rawmode.org/cgi-bin/mailman/listinfo/dbix-class
Wiki: http://dbix-class.shadowcatsystems.co.uk/
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/trunk/DBIx-Class/
Searchable Archive: http://www.mail-archive.com/dbix-class@lists.rawmode.org/

Re: [Dbix-class] abstract syntax (extract from a conversation)

Reply via email to